Welcome to the fourth session on K-Means Clustering Using PySpark. In the previous sessions, you learnt about Logistic Regression, which is a classification technique.
Let’s start this session by watching the upcoming video in which Ajay will provide an overview of the topics that will be covered in this session.
In this session
You will learn about K-Means Clustering, which is one of the commonly used unsupervised machine learning algorithms. Similar to the previous sessions, this session also consists of two parts. In the first part, you will learn how to implement these concepts using PySpark. The case study used in this session involves the use of clustering on similar artists from the music industry.
The python notebook used throughout this session is attached below:
Note
The PPT used in this session is provided in the ‘Session Summary’ segment.
People you will hear from in this session
Adjunct Faculty
Ajay Shukla
Data Science Lead – Myntra
Ajay has completed his undergraduate and postgraduate in Computer Science Engineering from IIT, BHU. He heads the pricing team at Myntra, where he actively works on technologies such as Data Science, Big Data, Spark and Machine Learning. Currently, his work involves developing discounting strategies for all the products offered offered by Myntra.
Adjunct Faculty
Ajay Shukla
Senior Data Engineer
Ajay is currently working as a senior data engineer. He has over 9 years of experience in the IT industry and has worked for several companies. He has deep knowledge of the various tools and technologies that are in use today.
Report an error