Running our DAG

In this segment, you will finally run the DAG that we created.

We recommended that you follow along with the demonstrations in your own EC2 instance.

Note:

You will need to configure the connections for Hive, Sqoop and Spark tasks. You will go through the details for those configurations in the following video.

In the upcoming video, Ajay will collate everything we created so far and execute our DAG.

So, in the video, the following steps were demonstrated:

Load the data using the mysql.dump file
Get into the Python virtual environment
Place your etl_dag.py(Actual dag program) in the airflow/dags folder
Configure the connections for Hive, Sqoop, Spark, etc.
Start the DAG using the toggle switch in the Airflow UI
Validate the KPIs generated by running some Hive queries

You can find the code and other resources used in the demonstration attached below:

The document attached below details the steps followed in the demonstration.

This marks the end of our Final demonstration.

Next, we will see some best practices with regards to Apache Airflow

Report an error