IKH

Spark Installation

In this segment, you will learn how to set up an Amazon EMR instance for Apache Spark.

Please note that you will learn the EMR configuration needed for Apache Spark in this segment.

Still, other steps related to setting up an Amazon EMR instance, such as login steps and YARN parameters steps, will remain the same as in the Introduction to Cloud and AWS Setup module. Make sure that you are following those steps as well.

Now, in this video, you will learn how to set up an EMR instance for Apache Spark.

As mentioned in the video, most of the steps are similar to previous EMR setups. The only significant changes are in the software packages chosen packages chosen during the initial setup of the instance.

The following document contains the steps to set up an EMR instance for Apache Spark:

You will be using EMR notebooks in this module to run PySpark codes. You have already learned about the concept of EMR notebooks in the Introduction to Cloud and AWS Setup module.

As a recap of the concepts, let’s watch the following video to learn how to access EMR notebooks on AWS.

The steps to create an EMR notebook are as follow.

NOTE:

If you’re facing issues working with EMR Notebooks on AWS Academy, please follow the steps in the documentation below to access the Jupyter Service through the Application User Interface of the EMR.

Now, let’s have a walkthrough of the Spark Documentation. For any queries related to the architecture, code or optimisation of resources, you can refer to Spark Document to find the solution. It contains information on every version of Spark and its features. Let’s watch the next video where Vishwa Mohan will go through the official Spark Documentation.

In the following segments, we will start coding demos in RDDs.

Additional Content:

  • Spark Official Documentation homepage – Link to the official documentation page for Apache Spark
  • Spark Python API Docs: Official link to the Python API docs for Spark 2.4.5

Report an error