IKH

Introduction to Apache Flink

In this segment, you will learn about the brief history of Apache Flink and some of its basic features. You will also look at some differences between batch processing and stream processing. 

In the following video, our expert Ajay will explain about the evolution of Apache Flink and how Netflix is using Flink for its streaming services.

 

Let’s summarise what you have learnt in this video.


Brief History

The brief history of Apache Flink is given in the following points:

  • In January 2010, Apache Software Foundation in collaboration with Berlin Technical University started a research project ‘Stratosphere’.
  • Apache Flink was initiated from the fork of Stratosphere’s distributed engine.
  • By 2014, ’Stratosphere’ became ‘Apache Incubator Project’.
  • In December 2014, Apache Flink became a top-level Apache project.
  • Apache Flink version 1.0 was released in March 2018.
  • The most recent version of Apache Flink 1.11 was released in July 2020.

Batch Processing

  • The events are processed at periodic intervals.
  • There is always some latency between the arrival and processing time of an event.

Stream Processing

  • An event is processed as soon as it arrives.
  • There is no or minimal latency between the arrival and processing time of an event.

Micro-Batching/Fast Batching

The incoming events are batched together every few seconds and processed in mini-batches with a few seconds of delay in arrival. 

Unbounded Datastream

Unbounded streams have a start but the end is not defined. The events are processed continuously, i.e., the data is processed right after the ingestion. Ordered ingestion is crucial for the completion of an event.

Bounded Datastream

Bounded streams have a fixed start and an end. The events are processed after ingesting all the data. Ordered ingestion is not required because a bounded data set is always sorted.

Apache Flink

Some basic features of Apache Flink are as follows:

  • It is an open-source stream processing framework for distributed and high-performance data streaming applications.
  • It supports both batch processing and stream processing.
  • It has the capability of processing millions of records per second.
  • It provides low latency and high throughput.

Additional Reading

Report an error