IKH

Windows

In the next video, you will learn about Windows in Spark Streaming.

In an application that process real-time events, it is common to perform some set-based computation (aggregation) or other operations on subsets of events that fall within some period of time. Since the concept of time is a fundamental necessity to complex event-processing systems, it is important to have a simple way to work with the time component of query logic in the system.

Now let’s see various example to understand the concept of windows better.

Let’s summarise what you learnt in this video.

  • Event time is the time when the record is generated at the source. It is generally represented as a column in the source data set; it is different from the processing time.
  • Processing time is the time when the record arrives at the Spark processing layer. The difference between the event time and the processing time is due to various reasons such as publishing failures, distributed system lags, network delays and other such latencies.

Next, you learnt about windows in Spark and understood how they are similar to the concept of windows in SQL. A window is nothing but a collection of records over a specific time period. The two types of windows are as follows:

  • Tumbling window: No two windows overlap.
  • Sliding window: Windows may or may not overlap.

The sliding duration in the former is equal to the window duration, which ensures that no two windows are overlapping. In the latter, the window duration is always a multiple of the sliding duration, thereby causing the overlap.

Note:

The default value of the sliding interval is the same as the batch interval.

Window functions are values that are computed over a window as we saw in the in-video example, where we specified the window duration as 10 minutes and calculated the record count using different output modes: complete, append and update.

Now, let’s move on to the next segment, where we will apply this knowledge in a coding lab.

Additional Readings

Sliding vs Tumbling Windows – The article highlights the salient features and differences between the two methodologies.

Report an error