In the previous segment, you learnt about the basics of DataSet API. In this segment, you will cover the various types of transformations available with DataSet API.
In the following video, our expert will demonstrate the transformations using an Apache Flink program in Java.
Let’s summarise what you have learnt in the video above.
Transformations
Map
It takes an element as input in one format and transforms it into another format. This transformation applies a map function to each element of a DataSet. The Map function implements a one-to-one mapping.
FlatMap
It takes one element as input and produces zero, one or more elements. It is a variant of the Map function and can produce an arbitrary number of elements as output.
Reduce
It is used to combine a group of elements into a single element by repeatedly combining two elements into one. It can be applied to both a whole data set or a group of elements of a data set.
Aggregate
It can be thought of as a built-in reduce function. It aggregates a group of elements into a single element. The built-in aggregate transformations available are as follows:
- Sum
- Min
- Max
Filter
It retains only those elements from a dataset for which the user-defined filter function returns true.
Union
It produces a union of two datasets. These two datasets have to be of the same type. Multiple union calls can be used for unions of more than two datasets.
Join
It joins two datasets into a single data set. It is similar to the SQL join. The various types of joins available are as follows:
- Default join
- Join with join function
- Join with flat-join function
- Join with projection
Additional Reading
- Transformations – This is the official documentation link that includes explanations on all the DataSet API transformation functions. Refer to this link first before attempting the in-segment questions.
Report an error