In this session, we covered in detail about two algorithms namely K-Mode and K-Prototype clustering.

To summarise, The K-modes clustering algorithm is based on K-means paradigm but removes the numeric data limitation while preserving its efficiency.

K-modes Algorithm uses modes instead of means to form clusters of categorical data.

Steps of the algorithm.

- Randomly assign âKâ number of modes.

- Calculate the dissimilarity score between each of the remaining data points from the âKâ number of chosen modes.

- Associate the data points to the mode whose score is minimum.

- Repeat from step 2 until there is no reassignment of clusters or when cost function is minimized.

For K-Prototype clustering, we combine K-means and K-Mode to handle both continuous and categorical data. For K-Prototype the distance function is as follows,

Where gamma is the weighting factor that determines the relative importance of numerical categorical attributes.

Steps of the algorithm:

- Select k.

- Allocate each data point to a cluster which is done with considering the dissimilarity measure.

- Retest the similarity of objects against the current prototypes. Update the prototypes.

- Repeat 3, until no object changes its cluster.

We also talked briefly about the DBSCAN algorithm which is a density-based clustering algorithm that divides a data set into subgroups of high-density regions.