In the previous segment, you understood how well Kafka integrates with Spark Streaming. In this segment, you will learn how you can read live tweets in real-time using Spark Structured Streaming.
Since you will be using Python, you will make use of POST APIs that work over HTTPS. If you were to use Java or Scala, then you would make use of TwitterUtils. You will also be using Filter API, as shown in the video provided above, which will help you make use of many types of filters such as location-based filtering and keyword-based filtering. This would allow developers to create streams based on real-time tweets filtered accordingly.
First, you will have to set up a Twitter developer account(steps shared in the ‘Module Introduction’ segment.) and procure access keys to connect to Twitter endpoints. After that, you will need to follow the steps given below.
- Create a request using Twitter Developer Credentials
- Connect to a Twitter endpoint and send a request
- Send the Twitter Stream to a localhost Socket
- Read the socket stream using Spark Structured Streams
- Perform Sentiment Analysis, Geospatial Event Analysis, Marketing Campaign Analytics or any other analysis that you want on Twitter data.
Now, let’s move on to the coding lab where you will track live tweets.
Make sure you have requests_oauthlib installed, use the following command
In the coding lab given above, you tracked live tweets in real-time for the word ‘corona’. You also sent the response back to your localhost over TCP in the JSON format. Then, you read the tweets from your socket file to take a look at all the tweets having the word ‘corona’ along with the links to those tweets.
Following are the code files which are used in this segment.
In the next segment, let’s summarise all that you learnt in this session.
Report an error