IKH

Summary

In this session, we covered in detail about two algorithms namely K-Mode and K-Prototype clustering.

To summarise, The K-modes clustering algorithm is based on K-means paradigm but removes the numeric data limitation while preserving its efficiency.

K-modes Algorithm uses modes instead of means to form clusters of categorical data.

Steps of the algorithm.

  • Randomly assign “K” number of modes.
  • Calculate the dissimilarity score between each of the remaining data points from the “K” number of chosen modes.
  • Associate the data points to the mode whose score is minimum.
  • Repeat from step 2 until there is no reassignment of clusters or when cost function is minimized.

For K-Prototype clustering, we combine K-means and K-Mode to handle both continuous and categorical data. For K-Prototype the distance function is as follows,

d(x,y)=∑pj=1(Xj−Yj)2+γ∑Mj=p+1δ(Xj−Yj)

Where gamma is the weighting factor that determines the relative importance of numerical categorical attributes.

Steps of the algorithm:

  1. Select k.
  2. Allocate each data point to a cluster which is done with considering the dissimilarity measure.
  3. Retest the similarity of objects against the current prototypes. Update the prototypes.
  4. Repeat 3, until no object changes its cluster.

We also talked briefly about the DBSCAN algorithm which is a density-based clustering algorithm that divides a data set into subgroups of high-density regions.

Report an error