Interpreting the Dendrogram

The result of the cluster analysis is shown by a dendrogram, which starts with all the data points as a separate cluster and indicates at what level of dissimilarity any two clusters were joined.

As you saw, the y-axis of the dendrogram is some measure of the dissimilarity or distance at which clusters join.

In the dendrogram shown above, samples 4 and 5 are the most similar and join to form the first cluster, followed by samples 1 and 10. The last two clusters to fuse together to form the final single cluster are 3-6 and 4-5-2-7-1-10-9-8.

Determining the number of groups in a cluster analysis is often the primary goal. Typically, one looks for natural groupings defined by long stems. Here, by observation, you can identify that there are 3 major groupings: 3-6, 4-5-2-7 and 1-10-9-8.

You also saw that hierarchical clustering can proceed in 2 ways — agglomerative and divisive. If you start with n distinct clusters and iteratively reach to a point where you have only 1 cluster in the end, it is called agglomerative clustering. On the other hand, if you start with 1 big cluster and subsequently keep on partitioning this cluster to reach n clusters, each containing 1 element, it is called divisive clustering.

Additional Reference

You can read more about divisive clustering here and here.

Comprehension – Hierarchical Clustering Algorithm

Given below are five data points having two attributes x and y:

Observation	X	Y
1	3	2
2	3	5
3	5	3
4	6	4
5	6	7

The distance matrix of the points, indicating the Euclidean distance between points, is as follows:

Label	1	2	3	4	5
1	0.00	3.00	2.24	3.61	5.83
2	3.00	0.00	2.83	3.16	3.61
3	2.24	2.83	0.00	1.41	4.12
4	3.61	3.16	1.41	0.00	3.00
5	5.83	3.61	4.12	3.00	0.00

Take the distance between two clusters as the minimum distance between the points in the two clusters. Based on this information, answer the following questions.

Report an error