We looked in the previous segment that for K-Means optimisation problem, the algorithm iterates between two steps and tries to minimise the objective function given as,

To choose the cluster centres smartly, we will learn about K-Mean++ algorithm. K-means++ is just an initialisation procedure for K-means. In K-means++ you pick the initial centroids using an algorithm that tries to initialise centroids that are far apart from each other.

Let’s understand the algorithm in detail in the next lecture.

To summarise, In K-Means++ algorithm,

- We choose one data point as the cluster centre at random.

- For each data point Xi, We compute the distance between Xi and the nearest centre that had already been chosen.

- Now, we choose the next cluster centre using the weighted probability distribution where a point X is chosen with probability proportional to d(X)2 .

- Repeat Steps 2 and 3 until K centres have been chosen.