Now let’s summarise your learnings from this session. In the next video, our SME will briefly summarise all the different topics related to optimising Spark Cluster that were covered in this session.
First, you understood the need for optimising cluster utilisation in Spark. You learnt about the common mistakes that lead to underutilisation and some of the practices that you should follow to optimise cluster utilisation.
In the next segment, you learnt about the different job deployment modes in Spark, such as the local mode, standalone mode and the YARN Cluster mode and YARN Client mode.
After this, you learnt how to tune the different Spark memory and CPU parameters. You also learnt how to use the Spark Submit command and, in turn, used the command to set the parameters for your Spark job.
In the next segment, you did a cost and performance analysis by running the same Spark job in different cluster configurations and saw how the execution time improves as you scale up your clusters.
In the next segment, you learnt how Spark jobs are deployed in the production environment. You also learnt about the different factors that you should consider while running jobs in the production environment.
Next, you learnt about some of the best practices that you should follow while working with Apache Spark.
Finally, in the last segment, you applied all the different optimisation techniques that you have learnt throughout the module, in the initial Spark job.
With the end of this session, the module has also come to an end. Finally, in this video, our SME will summarise the concepts that you learnt in this module.
The PPT that was used throughout this session is attached below.
The Lecture Notes for this module is as follows
Report an error