Dataset – IKH

In the previous segment, we discussed the basics of K-Means concepts. In this segment, we will discuss the data set to be used for the implementation of K-Means Clustering.

The data set file is attached below.

In the upcoming video, Ajay will explain the data set and its salient features.

As explained in the video above, the data set is of the UK-based music company Last.fm. Using this data set, we will perform clustering on the data of music artists based on their popularity in terms of the number of times people have listened to their songs. Such clustering can be used for the following purposes:

Recommendation: Recommending similar artists/songs to users
Monetisation and business: Releasing exclusive songs on the platform
Solving a cold start problem: Categorising a new artist’s songs in a cluster based on features like popularity, etc.

The given data set contains the following features:

user_id: Unique ID of each user playing the songs
artist_id: Unique ID of each artist whose song is present in the data set
artist_name: Name of the artist
plays: Total number of times a user has listened to a particular artist’s song

Now that you are familiar with the features of the data set, let’s perform K-Means clustering on the data set and cluster the artists based on their popularity. Let’s start by performing Exploratory Data Analysis (EDA) on the data set.

Report an error