IKH

Session Summary

This session involved a comprehensive study on programming with Spark RDDs. In this session, you were introduced to various operations that Spark offers, which makes it powerful as well as easy to use. Let’s watch the next video where Vishwa Mohan will summarise the topics discussed in the second session.

Note

Paired RDDs will be discussed in the next session.

To summarise this session:

  • RDDs can be created using three different ways: From parallelize() method, from a text file and from other RDDs.
  • There are two different operations that can be performed on RDDs – Transformation and Action. Transformation operations operate on the elements of the RDD and store them in new RDDs as RDDs are immutable. Action Operations causes all the transformations to execute and creates a particular output.
  • map() returns the same number of elements in RDD whereas flatMap() may result in a different number of elements in the output.
  • collect() method returns the elements of an RDD as an array to the driver program.
  • Lazy Evaluation and the use of DAG execution in Spark makes RDDs resilient as transformations are only executed once an action is called.

In the next session, you will learn about the paired RDDs. You will also understand paired RDDs and other APIs through various code demos and case studies in the following sessions.

Following is the PPT used by Vishwa.

Report an error