Session Introduction

Welcome to the second session on ‘Analytics using PySpark’ . In the last session, you learnt about performing the basic EDA using Spark ML Library and the basic Spark concepts.

Let’s start this session by watching the upcoming video in which Sajan outlines the concepts that will be covered in this session.

In this session

As Sajan discussed in the previous video, in this session you would be learning about the implementation of linear regression algorithms. After this, you would learn about the basic model building techniques using PySpark. Then you would learn about the implementation of the Linear Regression model using the Spark ML library. After building the model, you would look at the concepts of cross-validation and the Bias-Variance tradeoff.

Note:

The PPT used in this session is available in the ‘Session Summary’ segment.

People you will hear from in this session

Subject Matter Expert

Sajan Kedia

Data Science Lead – Myntra

Sajan has completed his undergraduate and postgraduate in Computer Science Engineering from IIT, BHU. He heads the pricing team at Myntra, where he actively works on technologies like Data Science, Big Data, Spark and Machine learning. Presently, his work mainly involves the development of discounting strategies for all the products offered by Myntra.

Subject Matter Expert

Jaidev Deshpande

Senior Data Scientist at Gramener

With over 10 years of experience in data science and predictive analysis, Jaidev has worked in multiple firms such as Springboard, iDataLabs and cube26. He has completed his bachelor’s degree in Electrical and Electronics Engineering from Vishwakarma Institute of Technology, Pune. He is currently working as a Senior Data Scientist at Gramener, a leading data science consulting company that advises clients on Data-Driven Leadership.

Report an error