In the previous segment, you looked at the Uber use case and understood the importance of data pipeline automation.
In the next video, Ajay will discuss some possible solutions to data pipeline automation.
In the earlier video, you looked at three possible solutions to data pipeline automation – Manual orchestration, Cron jobs and Apache Oozie.
To summarise, we discussed the following:
- Manual automation is very inefficient and impractical in an industrial setting.
- Cron jobs are easy to use/learn but cannot handle complex data pipelines and lack some essential features we desire from a data orchestration tool.
- Apache Oozie is a tool capable of handling large scale data pipelines but has a steep learning curve and is getting increasingly outdated every day.
As none of these solutions is good enough for our needs, we discussed our expectations from a new data pipeline automation tool.
In the next segment, we will formally introduce Apache Airflow and understand how it helps us solve these problems.
Additional Reading
Airflow vs Oozie: Comparison between Apache Airflow and Apache Oozie.
Refer to this link if you want to read about other solutions than the ones mentioned earlier.
Report an error