IKH

Session Summary

This session has been a comprehensive study on the Paired RDDs. This session has introduced various operations that Spark offers on paired RDDs which makes it so powerful as well as easy to use.

To summarise this session:

  • Paired RDDs are key/value pairs that can be created from basic RDDs using map() method.
  • keys() method returns the keys of a paired RDD as a new RDD and values() method is used to return the values of a paired RDD as a new RDD.
  • sortByKey() method is used to sort all the elements of a paired RDD according to the keys of the paired RDD.
  • reduceByKey() operates on the values of a particular key according to the operation defined.
  • Different RDD operations such as join(), rightOuterJoin(), leftOuterJoin() and cogroup() are used to join two different paired RDDs in different ways.

The Jupyter notebook that was used in Session 2 and Session 3 is as follows:

We will understand the data frame APIs in the upcoming session.

Report an error