IKH

K-Mode Clustering

In the previous session we learned in detail about the K-Means algorithm, we looked at the two steps, i.e. assignment step and the optimisation step in which the K-Means algorithm works iteratively.

The K-Means clustering algorithm is undoubtedly one of the most widely used partitional algorithms for numerical data or continuous data, but K-Means can’t handle categorical data, and the reason is that of the difference in the dissimilarity measure the K-Means uses.

The K-modes clustering algorithm is based on K-means paradigm but removes the numeric data limitation while preserving its efficiency.

Let’s listen to Prof.Dinesh to understand this in detail.

What do you think about a data set having both numerical and categorical values? Will the above methods do the job? Here comes the K-Prototype. That’s the simple combination of K-Means and K-Modes in clustering mixed attributes.

Let’s see this in detail in the next lecture.

For K-Prototype clustering, we combine K-means and K-Mode to handle both continuous and categorical data. For K-Prototype the distance function is as follows,

d(x,y)=∑pj=1(Xj−Yj)2+γ∑Mj=p+1δ(Xj−Yj)

Where gamma is the weighting factor that determines the relative importance of numerical categorical attributes.

In the next section, we will be looking at K-Mode and K-prototype in Python. Let’s move to the next segment.

Report an error