Before you get started with Spark Streaming itself, you need to first understand the meaning of the term Data Streams? Spark Streaming is an API of Apache Spark that is built specifically to process real-time streaming data; so, it would be a good starting point to understand the various industry scenarios where real-time data streams need to be processed. So, let’s watch the upcoming video to understand this better.
So, in the video, you learnt what streaming data is and also learnt about a few of its properties.
A stream of data is nothing but a continuous inflow of data, for example, a live video or Amazon’s log data. There is no discrete start or end to this stream as it keeps flowing forever. Since this stream is continuous, it is also usually high-volume data, which means the size of the incoming data would be large. Data streams are usually processed in near real-time.
Next in the video, you learnt that streams are used for the following important reasons:
- They help capture real-time reactions, which can be used for various analytical purposes.
- Since it is near-real-time processing, they can be used for fraud detection or for initiating prompt responses as per the needs of the situation.
- They can be used for scaling up or scaling down hardware as per incoming load.
Next in the video, you learnt about the streaming data architecture. It is a framework that is built for ingesting and processing streams of data. It comprises multiple components, such as:
- Stream Consumer,
- Data Persistence,
- Processing/Transformations and
- Analytics/BI.
Additional Readings:
Streaming Data – You can refer to the article to learn more about streaming data, its benefits and also go through a few examples.
Report an error