Before you learn about the various techniques for optimising Spark jobs, you need to understand the importance of its components.
In the next video, our SME will walk you through the various components and terminologies related to a Spark job.
As discussed in the video above, there are mainly four different terms that you will come across while working on a Spark job. There are as follows:
- Application: An application is the highest level of computation in Spark. It is the main function that contains all the code and can be run by the user to compute some results.
- Job: Whenever an action is called in the application, the work that is carried out is called a Spark job. It is important to note that an application may consist of multiple jobs.
- Stages: Whenever a shuffle operation occurs inside a Spark job, it creates a new Stage.
- Tasks: Tasks are the most basic unit of Spark computation. Each stage has multiple tasks.
Spark provides a utility known as the spark History Server UI, where you can easily check the various tasks and stages and how they are executed for your Spark job.
In the next video, our SME will run a spark job and walk you through the various components of the job with the help of the Spark History Server UI.
As you saw in the video above, using the Spark History Server UI, you are to view the various stages and individual tasks that are part of the execution of your Spark job.
In this segment, you learnt about the various components of a Spark job and saw how these individual components work together through the Spark History Server UI.
The link to the Jupyter Notebook used in this segment is given below. Please run this Jupyter Notebook in the EC2 instance.
Note:
Please note that you may get different results when you run these Jupyter Notebooks. This may be due to Network bandwidth changes and other internal reasons.
Additional Content:
- Spark History Server UI Documentation: Link to the official content on History Server UI for Spark 2.4.5
- Components of Spark job: Link to the Stackoverflow article on the meaning of Spark application, job, stage and tasks.
Report an error