IKH

What Is Spark Streaming

In the next video, our SME will walk you through the various APIs and the differences between the Structured Streaming and DStreams API.

Let’s summarise what you learnt in the above video.

Spark offers both low-level and high-level APIs include RDDs and DStreams (collection on RDDs), whereas the high-level APIs consist of DataFrames, DataSets and SQL.

Using these APIs, programmers can utilise applications like Streaming ML and Graphx as per their needs.

Next, you learnt about Spark Streaming. Spark Streaming has a declarative API, which means it shields the programmer from the internal complexities of running the job. These are taken care of by Spark and YARN. The user need not worry about job allocation or resource usage.

Further in the video, you learnt about the key differences between the Structured Streaming API and the DStream API. The Structured Streaming API is similar to the Spark batch processing model, and, in addition, it also offers the DataFrames and SQL APIs. The various Spark optimisations are picked up automatically, and it also offers an Exactly Once Guarantee, which means the same batch of data will not be processed more than once; it would be discarded. It also processes late-arriving data more effectively than the DStream API and, hence, it is preferred.

Additional Readings:

Official Documentation – To learn more about Spark Streaming.