Let's try: Airflow 2
This is a new version of Airflow. You can go back read the tutorial I made about Airflow 1 below.
What's new?
As far as I read the changelog, I can summarize the big points that we can reach and use it as a consumer here.
New UI
Of course, it comes with new cleaner user interface, more understandable history page.
TaskFlow API
Now we can add the decorator @task
on top of a method and assign them as a python operator. This feature is new to me as well in order to make code cleaner and easier to read. I will write this later.
and et cetera.
A little gimmick here. Twisting fan.
Can visit the official page here to read all changes.
Installation in docker
I do have time back then to build a container of Airflow 1 in my macbook but found the official image at that time isn't good enough. Kind of information is lacking in the website and configurations are lots to go. So I ended up using Puckel's image instead.
Now the official docker compose for Airflow 2 has been launched here so I have no need to find out an other more reliable one.
I have assembled the steps defined at the Airflow documentation page into a single repo below. You can clone and try yourself.
Details in the repo
- The installation starts from
docker-compose.yaml
. If you want to get more familiar with this, can visit my latest blog below. - The original
docker-compose.yaml
relies on default image but I want ability to add additional python packages, so that I create a simpledockerfile
. You can add yours atrequirements.txt
andconstraint.txt
if needed. - Disable sample DAGs at line 59 of
docker-compose.yaml
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
- Prepare all necessary directory:
/dags
,/logs
,/plugins
. Mostly use/dags
for works.
Let's hands-on
0. Prepare all dependencies
Make sure all dependencies of our works are listed in requirements.txt
and constraints.txt
(if any) before go next.
For example, I put the package pysftp
in the file so when we build an image up, the following packages will be installed in the worker instance and be ready to use.
Also find the available packages via https://pypi.org.
1. docker-compose up
Now it is ready to go. Run this to roll the ball.
docker-compose up
Seconds after this will show the necessary images are downloading.
Next are to create the scheduler which is for scheduling our jobs, and the worker that execute them.
And the last one is webserver. We see this means we are ready to view the airflow webpage now.
2. Logging in
Open a browser then go to http://localhost:8080. Use the username/password as airflow
/airflow
to login this page.
It should be successful and now we can see the first page of Airflow here.
3. Try add a DAG
Back to the editor. I example the DAG from https://airflow.apache.org/docs/apache-airflow/stable/tutorial/fundamentals.html and save it in the folder /dags
like this.
It wouldn't show in the DAGs list instantly. We can trigger them by
1. Access the worker instance with the command.
docker exec -it <airflow-worker-container-name> /bin/bash
The airflow-worker-container-name
can be retrieved by docker ps -a
and choose one with worker
in its name.
2. List the DAGs with command.
airflow dags list
It will automatically compile DAGs files and display on the web if all is successful.
4. Break a DAG to see errors
In some cases, the web show some errors, for example we programmatically put wrong syntaxes or imports.
We can check the failed one by this command.
airflow dags list-import-errors
There will be a table showing all error messages in every DAGs files.
5. Let the DAG runs
Say every DAGs are good. When we click one and go see the UI of DAG history is improved. It's more modern, clearer, and neater.
DAG history is quite easier to read. It shows some basic stats about the DAGs at a side.
And the graph is not much differ, yet better, right?
6. docker-compose down
Stop the running terminal ( control
+ c
on Mac) and down them using this command.
# stop all containers
docker-compose down
# stop all containers and remove everything
docker-compose down --volumes --rmi all
This is just an introduction of Airflow 2 developing on Docker technology. I don't recommend this for Production but it is great for local development.
If you are looking for the best way to deal with Airflow job, you can try this way and hope it will be useful for you.