Session Introduction

Welcome to the fourth session on ‘Analytics using PySpark’. In the previous session, you learnt about Logistic Regression, which is a classification technique.

Let’s start this session by watching the upcoming video in which Ajay will provide an overview of the topics that will be covered in this session.

In this session

You will learn about K-Means Clustering, which is one of the commonly used unsupervised machine learning algorithms. Similar to the previous sessions, this session also consists of two parts. In the first part, you will revisit the concepts of the K-Means algorithm. In the second part, you will learn how to implement these concepts using PySpark. The case study used in this session involves the use of clustering on similar artists from the music industry.

The python notebook used throughout this session is attached below:

Note:

The PPT used in this session is provided in the ‘Session Summary’ segment.

People you will hear from in this session

Subject Matter Expert

Ajay Shukla

Data Science Lead – Myntra

Ajay has completed his undergraduate and postgraduate in Computer Science Engineering from IIT, BHU. He heads the pricing team at Myntra, where he actively works on technologies such as Data Science, Big Data, Spark and Machine Learning. Currently, his work involves developing discounting strategies for all the products offered by Myntra.

Report an error