The result of the cluster analysis is shown by a dendrogram, which starts with all the data points as a separate cluster and indicates at what level of dissimilarity any two clusters were joined.
As you saw, the y-axis of the dendrogram is some measure of the dissimilarity or distance at which clusters join.
In the dendrogram shown above, samples 4 and 5 are the most similar and join to form the first cluster, followed by samples 1 and 10. The last two clusters to fuse together to form the final single cluster are 3-6 and 4-5-2-7-1-10-9-8.
Determining the number of groups in a cluster analysis is often the primary goal. Typically, one looks for natural groupings defined by long stems. Here, by observation, you can identify that there are 3 major groupings: 3-6, 4-5-2-7 and 1-10-9-8.
You also saw that hierarchical clustering can proceed in 2 ways — agglomerative and divisive. If you start with n distinct clusters and iteratively reach to a point where you have only 1 cluster in the end, it is called agglomerative clustering. On the other hand, if you start with 1 big cluster and subsequently keep on partitioning this cluster to reach n clusters, each containing 1 element, it is called divisive clustering.
Additional Reference
You can read more about divisive clustering here and here.
Comprehension – Hierarchical Clustering Algorithm
Given below are five data points having two attributes x and y:
Observation | X | Y |
1 | 3 | 2 |
2 | 3 | 5 |
3 | 5 | 3 |
4 | 6 | 4 |
5 | 6 | 7 |
The distance matrix of the points, indicating the Euclidean distance between points, is as follows:
Label | 1 | 2 | 3 | 4 | 5 |
1 | 0.00 | 3.00 | 2.24 | 3.61 | 5.83 |
2 | 3.00 | 0.00 | 2.83 | 3.16 | 3.61 |
3 | 2.24 | 2.83 | 0.00 | 1.41 | 4.12 |
4 | 3.61 | 3.16 | 1.41 | 0.00 | 3.00 |
5 | 5.83 | 3.61 | 4.12 | 3.00 | 0.00 |
Take the distance between two clusters as the minimum distance between the points in the two clusters. Based on this information, answer the following questions.