Using Transformations And Aggregations

In the previous segment, you saw a coding demo on triggers and output modes. In this segment, you will learn about transformations and aggregations.

So, to begin with, in the video, you saw a few key APIs, including the following:

PySpark.SQL – Access the SQL functionalities of spark
Select – Select columns names
SelecExpr – Any SQL – like statement can be written as string

Now, let’s take a look at the transformations:

Filter/Where – To filter out certain elements from an RDD that do not meet the criteria defined in the lambda expression
As/Aliasing – To make the output more readable by giving a different name, a.k.a. aliasing
GroupBy – Shuffle and group the data accordingly
Aggregations – Avg, Sum, Min, Max

Now, let’s watch these transformations in action through a coding lab.

So in the coding lab above, we started out with filtering words of length less than 4 characters using the Filter transform and the length function. You can try out other transformations as well to get a grasp on the concept.

Note: The operation on a Streaming DataFrame is very much similar to what you have seen in a normal Spark modules. Hence we highly recommend that you go through the following documentation and try out these various operations during this module.

Operations on streaming DataFrames/Datasets – Set of basic DataFrame operations.

Now, let’s move on to the next segment where you will learn about joins in streams.

Report an error