Welcome to the session on ‘Basic EDA Using Spark ML Library’.
Previously, you have learnt about the machine learning algorithms and their implementation on small datasets using Python. But what do you do when you have huge data sets? In this case, you need to use Spark to write productionisable code. The Spark ML library makes machine learning scalable and easy. In this session, you will learn about how to perform basic EDA using PySpark. Let’s hear from Sajan as he outlines the topics that will be covered as a part of this session.
In this session
You will first explore the Spark ML library API and learn how to perform basic EDA on a data set. You will also learn about important components such as feature transformers, feature estimators and pipelines and their usage while writing code in PySpark. In the upcoming sessions of this module, you will take references of various feature transformers and feature extractors while building machine learning models.
People you will hear from in this session
Subject Matter Expert
Ajay Shukla
Data Science Lead -Myntra
Ajay has completed his undergraduate and postgraduate in Computer Science Engineering from IIT, BHU. He heads the pricing team at Myntra, where he actively works on technologies like Data Science, Big Data, Spark and Machine learning. Presently, his work mainly involves the development of discounting strategies for all the products offered by Myntra.
Subject Matter Expert
Ajay Shukla
AI-COE. IKH Royal
Ajay has over 12 years of experience in machine learning and AI across various domains such as banking and financial services, e-commerce and telecom. He has worked with Amazon, Snapdeal and Citigroup. He is an expert in the application of ML in marketing and risk. He has worked with organisations across multiple geographies and developed and implemented data science solutions targeting different stages in the customer life cycle. He has worked extensively on building ML models and has experience in advanced techniques such as neural networks, CBMs and SVMs.
Report an error