IKH

Model Building

Now that we have performed EDA on the given data set and have the final DataFrame ready, we start building the model. Let’s hear from Ajay as he explains the process.

After importing the libraries, you need to define the K-Means object using the following the command:

Here, k=7 signifies the number of clusters to be formed. You can start training the model using the following block of code:

After training the model, you need to compute the cost using the following block of code:

The output of the clustering model above can be visualised using the following block of code:

In the output table, the model appends one more column named ‘prediction’, wherein the predicted values of the model (cluster ID of the artist) are displayed. To ascertain the centres of the cluster, you can use the following block of code:

If you want to print the names of artists whose total plays is greater than 100 (or any other number) along with their cluster IDs, you will need to create a temp view first. The step is to write an SQL query, as shown below.

Now, the model is ready, but we need to determine its accuracy. So, in the next segment, you will learn how to evaluate the model.

Report an error