In the previous segment, you learnt about various transformation functions. In this segment, you will gain an understanding of the concept of various notions of time and windows.
Let’s summarise your learnings from this video.
The notion of time
- Processing time: This refers to the clock time of a machine when the respective operation is getting executed.
- Event time: This is the time when an event occurred on its producing device. This time is generally embedded within the records before they enter Flink.
- Ingestion time: The refers to the time when events enter the Flink environment.
Watermarks
A watermark tells operators that no elements with a timestamp that is older than or equal to the watermark timestamp should arrive at the operator. Watermarks are emitted at the sources and propagate through the operators. Operators must themselves emit watermarks to downstream operators. The following image shows the watermarks.
Window operators
Aggregation on streams is performed on segments of records, which are called windows. Aggregate operations on Windows can be time-driven (for example, a count over the last five minutes) or data-driven (for example, a sum of the last 100 elements).
Keyed vs non-keyed Windows
You need to first specify whether your stream should be keyed or not. Using keyBy(…) will split your infinite stream into logical keyed streams. If keyBy(…) is not called, then your stream is not keyed.
Window assigners
The window assigner defines how elements are assigned to windows. This is done by specifying the WindowAssigner of your choice. The four types of built-in assigners are as follows:
- Tumbling Window Assigner: It assigns each element of a data stream to fixed window size.
- Sliding Window Assigner: The window size is fixed, and an additional window slide parameter controls the frequency at which a sliding window is started.
- Session Window Assigner: This groups elements based on session of activity. The session window does not have a start or an end; instead, the window closes when there is inactivity for a certain period of time.
- Global Window Assigner: All the elements with the same key are assigned to a single global window.
Window functions
The window functions specify the computation that you want to perform on each of these windows. The types of built-in functions are as follows:
- Reduce function: Two elements from an input are combined to produce an element of the same type.
- Aggregate function: It has three types of elements, which are input, accumulator and output. It is a generalised version of the reduce function. The aggregate function adds the input element to the accumulator and extracts the output from the accumulator.
- Fold function: This function specifies how the input element of the window is combined with the output element. Every time a new element is added to the window, a fold function is called for the new input element and the current output value.
- Process Window Function: It gets an iterable containing all the elements of windows. It can be combined with other functions.
Additional Reading
- Generating Watermarks – This is the 0fficial documentation page which explains the steps to generate a watermark in Apache Flink.
- Window Operators – This link explains the window lifecycle in Apache Flink.
Report an error