IKH

What Is Structured Streaming?

In this segment, you will learn about Structured Streaming, which is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. This means you can express your streaming computation in the same way as you would express a batch computation on static data. Since Structured Streaming is built over the Spark SQL engine, it comes with a lot of advantages, which will be covered further ahead in the module.

Now, Iet’s watch the upcoming video and learn about Structured Streaming from our expert, Kautuk.

So, in the video, you learnt about the following reasons why we use Structured Streaming:

  • It is a high-level API, which means it has more features for our accessibility and usage.
  • It offers ease of development.
  • It is compatible with other Spark APIs.
  • Spark optimisations are built-in.

Now, the key fundamentals that you learnt are as follows:

  • Spark principles stay in place
  • Lazy evaluation – Lazy evaluation in Spark means execution will not start until an action is triggered.
  • Transformations – Functions need to be applied to the input data in accordance with the requirements to get the desired outputs.
  • Actions – The output would be presented only when some action is performed.
  • Input: Data Sources
    • Streaming systems – Kafka, Flume
    •  File systems – S3
    • Sockets
  • Output: Data Sinks
    • Databases
    • Input systems
      Next, we moved on to the general code flow of a Spark Streaming application, which is
  • Create a SparkSession
    • The entry point for a structured streaming job
  • Read from source
    • Socket/Kafka/File, etc.
    • Read stream
    • Return DataFrame
  • Perform transformations
    • Create one or more DataFrames
  • Start – Action – Tell Spark to start the processing
  • AwaitTermination
    • Wait for the stream to finish

Now, in the next segment, we will create a simple Spark Structured Streaming application.

Additional Reading:

You can refer to the official documentation to gain more clarity on the topics covered in the following segments.

Report an error