IKH

Interpreting the Dendrogram

The result of the cluster analysis is shown by a dendrogram, which starts with all the data points as a separate cluster and indicates at what level of dissimilarity any two clusters were joined.

As you saw, the y-axis of the dendrogram is some measure of the dissimilarity or distance at which clusters join.

In the dendrogram shown above, samples 4 and 5 are the most similar and join to form the first cluster, followed by samples 1 and 10. The last two clusters to fuse together to form the final single cluster are 3-6 and 4-5-2-7-1-10-9-8.

Determining the number of groups in a cluster analysis is often the primary goal. Typically, one looks for natural groupings defined by long stems. Here, by observation, you can identify that there are 3 major groupings: 3-6, 4-5-2-7 and 1-10-9-8.

You also saw that hierarchical clustering can proceed in 2 ways — agglomerative and divisive. If you start with n distinct clusters and iteratively reach to a point where you have only 1 cluster in the end, it is called agglomerative clustering. On the other hand, if you start with 1 big cluster and subsequently keep on partitioning this cluster to reach n clusters, each containing 1 element, it is called divisive clustering.

Additional Reference

You can read more about divisive clustering here and here.

Comprehension – Hierarchical Clustering Algorithm

Given below are five data points having two attributes x and y:

Observation X Y
1 3 2
2 3 5
3 5 3
4 6 4
5 6 7

The distance matrix of the points, indicating the Euclidean distance between points, is as follows:

Label12345
10.00
3.00
2.243.61
5.83
23.00
0.00
2.83
3.16
3.61
32.24
2.83
0.001.41
4.12
43.61
3.16
1.41
0.00
3.00
55.83
3.61
4.12

3.00

0.00

Take the distance between two clusters as the minimum distance between the points in the two clusters. Based on this information, answer the following questions.

Report an error