By now, you have a clear understanding of the DataSet API. In this segment, you will learn about the DataStream API.
Before starting with the DataStream API, let’s set up our system to run the Flink program in Java. Refer to the following video to install Maven in your system.
Note:
Please note that you can use the link below to download Apache Maven on your local machine.https://ds-de-flink.s3.amazonaws.com/apache-maven-3.6.3-bin.zip
Now that you have installed Maven in your system, the following video will provide steps to create a project.
The following attached document provides steps explained in the above videos.
You can find the link to Apache Maven installation files at the start of this segment.
You can follow the document below to install Apache Maven on your Windows machine.
In the upcoming video, our expert will explain the basics of the DataStream API.
Let’s summarise your learnings from this video.
DataStream
- It is a finite or unbounded, immutable collection of data objects.
- It can contain duplicates.
- Data is read from the source into the DataStream. At every transformation step, a new Datastream gets created.
- DataStream can be created using: DataStream<String> words = …
Anatomy of a Flink program
- Obtains an execution environment
- Initially loads data from the data source
- Specifies transformations on this data
- Specifies the data sink
- Triggers program execution
Note: Note: In order to run the program demonstrated in your machine, either you can create a new project and add this java file in your project Or you can use the below-attached folder (contains all the project files).
Additional Reading
- DataStream API – This is the official documentation page explaining concepts of DataStream API.
Report an error