In the previous segment, we discussed the basics of K-Means concepts. In this segment, we will discuss the data set to be used for the implementation of K-Means Clustering.
The data set file is attached below.
In the upcoming video, Ajay will explain the data set and its salient features.
As explained in the video above, the data set is of the UK-based music company Last.fm. Using this data set, we will perform clustering on the data of music artists based on their popularity in terms of the number of times people have listened to their songs. Such clustering can be used for the following purposes:
- Recommendation: Recommending similar artists/songs to users
- Monetisation and business: Releasing exclusive songs on the platform
- Solving a cold start problem: Categorising a new artist’s songs in a cluster based on features like popularity, etc.
The given data set contains the following features:
- user_id: Unique ID of each user playing the songs
- artist_id: Unique ID of each artist whose song is present in the data set
- artist_name: Name of the artist
- plays: Total number of times a user has listened to a particular artist’s song
Now that you are familiar with the features of the data set, let’s perform K-Means clustering on the data set and cluster the artists based on their popularity. Let’s start by performing Exploratory Data Analysis (EDA) on the data set.
Report an error