Welcome to the second session on ‘Programming with Spark RDD’. The previous session was based on the basic concepts of Apache Spark, where you learnt about different components of Spark and understood the differences between Spark and MapReduce. You were also introduced to the core abstraction of Spark, RDDs.
In this session, you will learn how to code Spark programs in an RDD, the data structure of Spark Core. This session consists of code demos that explain the function of each operation on RDDs and case studies that will help you learn how to use those operations to analyse data.
In the next video, let’s hear from our SME, Vishwa Mohan, where he will outline the topics to be discussed in this session.
This session will build on the programming aspect of Spark, where you will learn how to load and process data in Spark RDDs using PySpark API. This session will cover the following topics:
- Setting up Spark on an instance.
- Creating RDDs
- Performing operations on RDDs.
The objective of this session is to give you hands-on experience of working on the computation engine of Spark. If you encounter errors while running any codes, then try debugging them on your own, as this will give you valuable experience in handling Spark jobs.
The codes given in the video must be typed and implemented. The explanations in the form of text along with the additional exercises have been provided in the videos. Make sure that you implement those for better understanding.
People you will hear from in this session
Adjunct Faculty
Vishwa Mohan
Senior Software Engineer, LinkedIn
Vishwa is currently working as a senior software engineer at LinkedIn, an online employment-oriented platform. He has over nine years of experience in the IT industry and has worked for various companies, including Amazon, Walmart, Oracle and others. He has a deep knowledge of various tools and technologies used today.
Adjunct Faculty
Kautuk Pandey
Senior Data Engineer
Kautuk is currently working as a senior data engineer. He has over 9 years of experience in the IT industry and has worked for several companies. He has deep knowledge of the various tools and technologies that are in use today