In the previous session we learned in detail about the K-Means algorithm, we looked at the two steps, i.e. assignment step and the optimisation step in which the K-Means algorithm works iteratively.
The K-Means clustering algorithm is undoubtedly one of the most widely used partitional algorithms for numerical data or continuous data, but K-Means can’t handle categorical data, and the reason is that of the difference in the dissimilarity measure the K-Means uses.
The K-modes clustering algorithm is based on K-means paradigm but removes the numeric data limitation while preserving its efficiency.
Let’s listen to Prof.Dinesh to understand this in detail.
What do you think about a data set having both numerical and categorical values? Will the above methods do the job? Here comes the K-Prototype. That’s the simple combination of K-Means and K-Modes in clustering mixed attributes.
Let’s see this in detail in the next lecture.
For K-Prototype clustering, we combine K-means and K-Mode to handle both continuous and categorical data. For K-Prototype the distance function is as follows,
Where gamma is the weighting factor that determines the relative importance of numerical categorical attributes.
In the next section, we will be looking at K-Mode and K-prototype in Python. Let’s move to the next segment.