In the previous session we learned in detail about the K-Means algorithm, we looked at the two steps, i.e. assignment step and the optimisation step in which the K-Means algorithm works iteratively.
The K-Means clustering algorithm is undoubtedly one of the most widely used partitional algorithms for numerical data or continuous data, but K-Means can’t handle categorical data, and the reason is that of the difference in the dissimilarity measure the K-Means uses.
The K-modes clustering algorithm is based on K-means paradigm but removes the numeric data limitation while preserving its efficiency.
Let’s listen to Prof.Dinesh to understand this in detail.
What do you think about a data set having both numerical and categorical values? Will the above methods do the job? Here comes the K-Prototype. That’s the simple combination of K-Means and K-Modes in clustering mixed attributes.
Let’s see this in detail in the next lecture.
For K-Prototype clustering, we combine K-means and K-Mode to handle both continuous and categorical data. For K-Prototype the distance function is as follows,
d(x,y)=∑pj=1(Xj−Yj)2+γ∑Mj=p+1δ(Xj−Yj)
Where gamma is the weighting factor that determines the relative importance of numerical categorical attributes.
In the next section, we will be looking at K-Mode and K-prototype in Python. Let’s move to the next segment.