IKH

Session Overview

Welcome to the second session on ‘Programming with Spark RDD’. The previous session was based on the basic concepts of Apache Spark, where you learnt about different components of Spark and understood the differences between Spark and MapReduce. You were also introduced to the core abstraction of Spark, RDDs. 

In this session, you will learn how to code Spark programs in an RDD, the data structure of Spark Core. This session consists of code demos that explain the function of each operation on RDDs and case studies that will help you learn how to use those operations to analyse data.

In the next video, let’s hear from our SME, Vishwa Mohan, where he will outline the topics to be discussed in this session.

This session will build on the programming aspect of Spark, where you will learn how to load and process data in Spark RDDs using PySpark API. This session will cover the following topics:

  • Setting up Spark on an instance.
  • Creating RDDs
  • Performing operations on RDDs.

The objective of this session is to give you hands-on experience of working on the computation engine of Spark. If you encounter errors while running any codes, then try debugging them on your own, as this will give you valuable experience in handling Spark jobs.

The codes given in the video must be typed and implemented. The explanations in the form of text along with the additional exercises have been provided in the videos. Make sure that you implement those for better understanding.

People you will hear from in this session

Adjunct Faculty

Vishwa Mohan

Senior Software Engineer, LinkedIn

Vishwa is currently working as a senior software engineer at LinkedIn, an online employment-oriented platform. He has over nine years of experience in the IT industry and has worked for various companies, including Amazon, Walmart, Oracle and others. He has a deep knowledge of various tools and technologies used today.

Adjunct Faculty

Kautuk Pandey

Senior Data Engineer

Kautuk is currently working as a senior data engineer. He has over 9 years of experience in the IT industry and has worked for several companies. He has deep knowledge of the various tools and technologies that are in use today

Report an error