IKH

Using Transformations And Aggregations

In the previous segment, you saw a coding demo on triggers and output modes. In this segment, you will learn about transformations and aggregations.

So, to begin with, in the video, you saw a few key APIs, including the following:

  • PySpark.SQL – Access the SQL functionalities of Spark
  • Select – Select columns names
  • SelecExpr – Any SQL=like statement can be written as a string

Now, let’s take a look at the transformations;

  • Filter/Where – To filter out certain elements from an RDD that do not meet the criteria defined in the lambda expression
  • As/Aliasing – To make the output more readable by giving a different name, a.k.a. aliasing
  • GroupBy– Shuffle and group the data accordingly
  • Aggregations – Avg, Sum, Min, Max

Now, let’s watch these transformations in action through a coding lab.

So in the coding lab above, we started out with filtering words of length less than 4 characters using the Filter transform and length function. You can try out other transformations as well to get a grasp on the concept.

Note:

The operation on a Streaming DataFrame is very much similar to what you have seen in a normal Spark DataFrame during the previous two Spark modules. Hence we highly recommend that you go through the following documentation and try out these various operations during this module.

  • Operations on streaming DataFrames/Datasets – Set of basic DataFrame operations.

Now, let’s move on to the next segment where you will learn about joins in streams.

Report an error