In the previous segment, you learnt about the basics of the DataStream API. In this segment, you will learn about the concepts of State and Fault Tolerance.
Let’s summarise your learnings from this video.
State
Stateful functions and operators store data across the processing of individual elements/events. The two types of states are as follows:
- User-defined State: This is created and modified directly by the transformation functions [such as map() or filter()].
- System State: This state refers to data buffers that are part of the operator’s computation. For example, when aggregating events at the last minute/hour, the state holds the pending aggregates.
Flink uses an embedded database (RocksDb) to store state data locally. Checkpointing is used to guarantee the persistence of a state globally.
Fault tolerance
Checkpointing
Drawing snapshots of the distributed data stream and operator state are at the heart of Flink’s fault tolerance mechanism. The mechanism for drawing these asynchronous snapshots is inspired by the standard Chandy-Lamport algorithm for distributed snapshots. Savepoints are manually triggered checkpoints and allow updating programs and Flink cluster without losing any state.
Barriers
Distributed snapshotting is based on stream barriers, which are injected into the data stream and flow with the records. They maintain the sequence and segregate the records that go into the current snapshot and the records in the next snapshot. Each barrier carries the ID of the snapshot whose records it pushes in front of it.
The following image depicts the fault tolerance mechanism in Apache Flink.
Now in the following video let’s look at the checkpointing modes available.
Checkpointing modes
Exactly-once semantics
- Flink uses a two-phase commit protocol to provide exactly once state update.
- Generally, it adds a small latency of a few milliseconds.
- For applications with super-low latency requirements, Flink has a switch to skip the stream alignment during a checkpoint.
At-least-once semantics
- An operator continuously processes all the inputs. It also processes elements that belong to the checkpoint n+1 before the snapshot for checkpoint n is taken.
- During restore, these records will result in duplicates, as they are both included in the state snapshot of the checkpoint n and will be replayed as is.
Additional Reading
- State – This is the official documentation page explaining the types of states available.
Report an error