In this segment, you will finally run the DAG that we created.
We recommended that you follow along with the demonstrations in your own EC2 instance.
Note:
You will need to configure the connections for Hive, Sqoop and Spark tasks. You will go through the details for those configurations in the following video.
In the upcoming video, Ajay will collate everything we created so far and execute our DAG.
So, in the video, the following steps were demonstrated:
- Load the data using the mysql.dump file
- Get into the Python virtual environment
- Place your etl_dag.py(Actual dag program) in the airflow/dags folder
- Configure the connections for Hive, Sqoop, Spark, etc.
- Start the DAG using the toggle switch in the Airflow UI
- Validate the KPIs generated by running some Hive queries
You can find the code and other resources used in the demonstration attached below:
The document attached below details the steps followed in the demonstration.
This marks the end of our Final demonstration.
Next, we will see some best practices with regards to Apache Airflow
Report an error