Now that you had a general overview of the DAG, let’s take a look at its actual code.
We recommend that you follow along with the demonstrations in your own EC2 instance.
Since the code is quite long, the demo is divided into two parts.
In the upcoming video, Ajay will demonstrate the first one.
In this video, you saw the creation of the DAG object and understood the definition of the following tasks:
- extract_booking_table
- extract_trip_table
- create_raw_booking_location
- create_filtered_booking_location
- create_raw_trip_location
- create_filtered_trip_location
- create_result_trip_throughput_location
- create_result_car_with_most_trips_location
In the next video, Amit will demonstrate the second part of our DAG construction.
In this video, you understood the definition of the following tasks:
- create_hive_database
- create_booking_raw_table
- create_trip_raw_table
- add_partitions_for_booking_table
- add_partitions_for_trip_table
- create_filter_booking_table
- create_filter_trip_table
- filter_booking_table
- filter_trip_table
- create_car_with_most_trips_table
- generate_car_with_most_trips
- create_trip_throughput_table
- generate_trip_throughput
The etl_dag.py file is attached below for your reference.
We will run this file in an upcoming segment.
In the next segment, you will take a look at the four Spark applications used in this DAG.
Report an error