First Spark Structured Streaming Application

In the previous segment, you learnt about the general flow of code of a streaming application. In this segment, we will create a simple Spark Structured Streaming application.

So, let’s watch the upcoming video as we build our own Spark Structured Streaming application.

Before running the below codes make sure to install the Netcat service on your EMR instance. Please run the below two commands as the hadoop user on your EMR instance.

Example

Python

yum update -y
yum install -y nc

Output

So, in the video,

We first built a SparkSession using the getOrCreate() function and then.

Example

Python

spark = SparkSession.builder.appName("StructuredSocketRead").getOrCreate()

Output

We told Spark that we would be reading from a socket.

Example

Python

lines = spark.readStream.format("socket").option("host","<Public IP of MasterNode>").option("port",12345).load()

Output

Next, we used the writeStream() method and specified the output mode.

We also called the start() action.

Remember, we need to tell Spark where we want to write our stream to, in this case it is the console.

Example

Python

query = lines.writeStream.outputMode("append").format("console").start()

Output

At last, we told Spark to run continuously until some external termination is called.

Example

Python

query.awaitTermination()

Output

In order to run the program:

We opened another session and using the netcat command, we opened a port to communicate with Spark. As we sent in data from this session, Spark read the input and wrote the output on the console.

So, there you have it, our first Spark Streaming application. The code files used in the segment are attached below.

Now, let’s move on to the next segment and learn about triggers and output modes.

Report an error