In this segment, we will delve into one of the most important concepts in Airflow – DAGs.
DAGs or Directed Acyclic Graphs are what Airflow uses to implement its data pipelines.
In the following video, Ajay will discuss the topic in detail.
In the previous video, we learnt about Directed Acyclic Graphs (DAGs) which form the core structure around which Airflow designs its data pipelines.
Following diagram shows what a very simple DAG in the Airflow UI looks like:
The properties of a DAG are as follows:
- DAGs are graph structures (i.e., a collection of vertices and edges).
- It should be directed (i.e., the edges all have a direction indicating which task is dependent on which).
- It must be acyclic (i.e., it cannot contain cycles).
As seen in the DAG definition, DAGs are written in Python.
Following is the sample code:
As you can see in the previous sample code, some of the important attributes of the DAG are as follows:
- ID – ‘my_dag’
- Description – ‘Sample DAG’
- Schedule – ‘0 12 * * *’
- Start Date – datetime(2016, 1, 1)
- Configs/default arguments – dag_default_configs
These are the important attributes of a DAG. You can specify many more to help address your requirements better.
Finally, we discussed the Cron expression. Following is the guide for the same:
In the next video, we will learn about the components of a DAG and see the DAG for the Uber use case we saw in one of the previous segment.
In the previous video, we looked at the components that make up a DAG:
- Task – It defines a unit of work within a DAG.
- Task dependencies – They define the order in which the tasks in a DAG are executed.
We also converted the Uber data pipeline from the following diagram :
In the next segment, we will understand the internal architecture of Airflow and study its components in detail.
Additional Reading
DAG documentation – Refer to this link If you want to know more about DAG attributes/parameters.
Report an error