Euclidean Distance

In the previous segments, you got an idea about how clustering works – it groups the objects on the basis of their similarity or closeness to each other.

Now, the next important thing is to get into the nitty-gritty of how clustering algorithms generally work. You will learn about the 2 types of clustering methods – K-means and Hierarchical and how they go about doing the clustering process.

We have learnt that clustering works on the basis of grouping the observations which are the most similar to each other. What does this exactly mean?

In simple terms, the algorithm needs to find data points whose values are similar to each other and therefore these points would then belong to the same cluster. The method in which any clustering algorithm goes about doing that is through the method of finding something called a “distance measure”. The distance measure that is used in K-means clustering is called the Euclidean Distance measure. Let’s look at the following lecture to understand how this value is calculated.

As mentioned in the video above, the Euclidean Distance between the 2 points is measured as follows: If there are 2 points X and Y having n dimensions.

X=(X1,X2,X3,…Xn)

Y=(Y1,Y2,Y3,….Yn)

Then the Euclidean Distance D is given as.

D=√(X1−Y1)2+(X2−Y2)2+…(Xn−Yn)2

The idea of distance measure is quite intuitive. Essentially, the observations which are closer or more similar to each other would have a low Euclidean distance and the observations which are farther or less similar to each other would have a higher Euclidean distance. So can you now guess how the Clustering process would work based on the Euclidean distance?

Now once you’ve computed the Euclidean distance, the next step is pretty straightforward for the Clustering Algorithm. All it has to do is compute these distances and then find out which observations or points have a low Euclidean distance between them, i.e. are closer to each other and then cluster them together.

Now answer the following questions

Report an error