IKH

Session Introduction

Welcome to the fourth session on K-Means Clustering Using PySpark. In the previous sessions, you learnt about Logistic Regression, which is a classification technique.

Let’s start this session by watching the upcoming video in which Ajay will provide an overview of the topics that will be covered in this session.

In this session

You will learn about K-Means Clustering, which is one of the commonly used unsupervised machine learning algorithms. Similar to the previous sessions, this session also consists of two parts. In the first part, you will learn how to implement these concepts using PySpark. The case study used in this session involves the use of clustering on similar artists from the music industry.

The python notebook used throughout this session is attached below:

Note

The PPT used in this session is provided in the ‘Session Summary’ segment.

People you will hear from in this session

Adjunct Faculty

Ajay Shukla

Data Science Lead – Myntra

Ajay has completed his undergraduate and postgraduate in Computer Science Engineering from IIT, BHU. He heads the pricing team at Myntra, where he actively works on technologies such as Data Science, Big Data, Spark and Machine Learning. Currently, his work involves developing discounting strategies for all the products offered offered by Myntra.

Adjunct Faculty

Ajay Shukla

Senior Data Engineer

Ajay is currently working as a senior data engineer. He has over 9 years of experience in the IT industry and has worked for several companies. He has deep knowledge of the various tools and technologies that are in use today.

Report an error