Hi, guess you are not getting bored about Airflow stuff.

This blog, we are going to see how can we make sure our DAG is looking good. With DAG integrity checking method, we can ensure at our DAG is proved to be executable and has no error in a basic level.

The objective

At a unit test step, we just want to guarantee our DAGs are imported correctly. No syntax errors nor library import errors. We don't do proving our pipeline is perfect at this time, we do that in integration test or end-to-end test.

What should we do now?

DAGBAG

DAGBAG is a module in Airflow DAG. It stores the DAGs and has structured in DAGs' metadata. For more info, visit this link.

We can use this module to verify our DAGs are imported properly. Like this code stub.

After importing DagBag and initiate the class object as dagbag at line 3, we can print out its attributes .dags and .import_errors to see list of DAGs and list of errors if any.

This is similar to the commands we used in the last blog. Follow this link below if you want to re-read.

Let’s try: Airflow 2
Airflow 2 comes with lots of improvements. Why not spend some times to get know this for easier batch job development?

Combine with unittest

We use unittest together with this DagBag to test if we have the target DAG or not.

Please follow this link to a complete scripts.

airflow-docker/dag_integrity.py at main · bluebirz/airflow-docker
Docker-compose for local airflow development. Contribute to bluebirz/airflow-docker development by creating an account on GitHub.

Now we try run this command to validate the DAG and see we found a DAG in DagBag.

python tests/dag_integrity.py

If DAGs are good, we shall see this message.

Otherwise, it will show an error like this.

Further applications

This is great to do unit test before deploying our apps to server, either preproduction or production.

We could add the command into our CI/CD stages. Any in our favor, Github action, Bitbucket pipeline, Google Cloud Build and others.

Have a great day with no bugs.