With this, you have reached the end of this session. Let’s hear from Sajan as he summarises your learnings from this session.
So in this session, you explored the Spark MLlib API and learnt how to perform basic EDA on a data set. You learnt about some of the important components such as feature transformer, feature estimators and pipelines and their usage while writing code in PySpark. In the upcoming sessions of this module, you would be learning about the machine learning algorithms and their implementation using PySpark by taking references from these feature transformers and feature extractors because they from the basis of data preprocessing.