IKH

Using Transformations And Aggregations

In the previous segment, you saw a coding demo on triggers and output modes. In this segment, you will learn about transformations and aggregations.

So, to begin with, in the video, you saw a few key APIs, including the following:

  • PySpark.SQL – Access the SQL functionalities of spark
  • Select – Select columns names
  • SelecExpr – Any SQL – like statement can be written as string

Now, let’s take a look at the transformations:

  • Filter/Where – To filter out certain elements from an RDD that do not meet the criteria defined in the lambda expression
  • As/Aliasing – To make the output more readable by giving a different name, a.k.a. aliasing
  • GroupBy – Shuffle and group the data accordingly
  • Aggregations – Avg, Sum, Min, Max

Now, let’s watch these transformations in action through a coding lab.

So in the coding lab above, we started out with filtering words of length less than 4 characters using the Filter transform and the length function. You can try out other transformations as well to get a grasp on the concept.

Note: The operation on a Streaming DataFrame is very much similar to what you have seen in a normal Spark modules. Hence we highly recommend that you go through the following documentation and try out these various operations during this module.

Now, let’s move on to the next segment where you will learn about joins in streams.

Report an error