In the previous segment, you learnt about window operations in Spark Streaming. Now, let’s apply that knowledge and implement window operations in a Spark Structured Streaming program.
Now, let’s look at some more details of the window functions.
Since we have to read from a file, we removed the portion of the code that was used to read from the socket. We have three fields in our data file, which we added according to our schema.
We created a window duration of 1 day, while our sliding duration was set to 1 hour.
As we were using aggregations, we changed our output mode to Complete.
Below you will find the code used in the video above. Make sure you prepare the players. csv file in the required directory before you run the code.
Note:
For the data file used in this segment, you can create it yourself by using the following format and then inputting your own data into a JSON file. You just need to change the values of each entry, especially the activity_time. Also, the activity_type can be either CPU, Memory, IO or Network.
Let’s move to the next segment, where you will learn about late-arriving data.
Report an error