By now, you are familiar with the two core APIs of Apache Flink, which are DataSet and DataStream API. In this segment, you will be introduced to the Table API and SQL library that can interact with both APIs.
Prerequisites: Apache Flink supports a Python 3.x version. So, you need to set up it in your machine. Follow the below-attached document to do the same before starting with the video.
In the following video, the expert will give a brief overview of the Table API/SQL and explain the basic anatomy of a Flink job using a Python Flink program.
Let’s summarise your learnings from this video.
Table API and SQL
The Table API is a language-integrated query API that allows the composition of queries from relational operators such as selection, filter and join in an intuitive way. Flink’s SQL support is based on Apache Calcite that implements the SQL standard. Queries specified in either of the interfaces have the same semantics and specify the same result regardless of whether the input is a batch or a stream.
Two planners
Blink planner
- It was created by Alibaba. Blink treats batch jobs as a special case of streaming.
- The conversion between Table and DataSet is not supported here.
- It is the default planner since Flink 1.11.
Old planner
- It was created by Flink. The old planner supports the conversion between Table and DataSet or DataStream.
- It supports the BatchTableEnvironment and the StreamTableEnvironment.
Anatomy of a Flink job
- Create a TableEnvironment for a batch or streaming
- Register the input and output Tables
- Create a Table from a Table/SQL API query
- Emit the result Table to a TableSink
- Execute the job
Temporary vs Permanent Tables
The comparison between temporary vs permanent tables is presented in the table given below.
| Temporary Tables | Permanent Tables |
| They are stored in memory and only exist for the duration of the Flink session within which they are created. | They are visible across multiple Flink sessions and clusters. |
| They are not bound to any catalog or database but can be created in the namespace. | They require a catalog (such as the Hive Metastore) to maintain the metadata about the table. |
Note: In order to run the program demonstrated in your machine, either you can create a new project and add this python file in your project Or you can use the below-attached folder (contains all the project files).
Report an error