Introduction to Table API & SQL

By now, you are familiar with the two core APIs of Apache Flink, which are DataSet and DataStream API. In this segment, you will be introduced to the Table API and SQL library that can interact with both APIs.

Prerequisites: Apache Flink supports a Python 3.x version. So, you need to set up it in your machine. Follow the below-attached document to do the same before starting with the video.

In the following video, the expert will give a brief overview of the Table API/SQL and explain the basic anatomy of a Flink job using a Python Flink program.

Let’s summarise your learnings from this video.

Table API and SQL

The Table API is a language-integrated query API that allows the composition of queries from relational operators such as selection, filter and join in an intuitive way. Flink’s SQL support is based on Apache Calcite that implements the SQL standard. Queries specified in either of the interfaces have the same semantics and specify the same result regardless of whether the input is a batch or a stream.

Two planners

Blink planner

It was created by Alibaba. Blink treats batch jobs as a special case of streaming.
The conversion between Table and DataSet is not supported here.
It is the default planner since Flink 1.11.

Old planner

It was created by Flink. The old planner supports the conversion between Table and DataSet or DataStream.
It supports the BatchTableEnvironment and the StreamTableEnvironment.

Anatomy of a Flink job

Create a TableEnvironment for a batch or streaming
Register the input and output Tables
Create a Table from a Table/SQL API query
Emit the result Table to a TableSink
Execute the job

Temporary vs Permanent Tables

The comparison between temporary vs permanent tables is presented in the table given below.

Temporary Tables	Permanent Tables
They are stored in memory and only exist for the duration of the Flink session within which they are created.	They are visible across multiple Flink sessions and clusters.
They are not bound to any catalog or database but can be created in the namespace.	They require a catalog (such as the Hive Metastore) to maintain the metadata about the table.

Note: In order to run the program demonstrated in your machine, either you can create a new project and add this python file in your project Or you can use the below-attached folder (contains all the project files).

Report an error