Session Overview

Welcome to the session on ‘optimising Network IO for Spark’.

In the next video, you will get a brief introduction to the topics that will be covered in this session.

In this session

You will learn about some basic concepts of Network IO and the various optimisations that can be applied to reduce it in Spark jobs.

In this session, you will first learn about Network IO and look at what operations cause it. You will also learn about the concept of data locality and the techniques to reduce Network IO.
In the following segment, you will learn about the concept of Shuffles and understand what operations cause it. This will be followed by techniques that can help in reducing shuffles.
In the next segment, you will learn how you can optimise joins in Spark, where you will first learn how joins affect Network IO and then look at the different types of optimised joins.
In the last segment, you will learn about the concept of data partitioning and look at how to implement custom partitioning in Spark.

People you will hear from in this session

Subject Matter Expert

Vishwa Mohan

Senior Software Engineer, LinkedIn

Vishwa is currently working as a senior software engineer at LinkedIn, an online employment-oriented platform. He has over nine years of experience in the IT industry and has worked in various companies, including Amazon, Walmart, Oracle and others. He has deep knowledge of various tools and technologies that are used today.

Report an error