Airflow Quickstart

15 Nov 2022 - rich

A quickstart guide to help you bootstrap an Airflow standalone instance on your local machine.

Note Successful installation requires a Python 3 environment. Starting with Airflow 2.3.0, Airflow is tested with Python 3.7, 3.8, 3.9, 3.10. Note that Python 3.11 is not yet supported.

Only pipinstallation is currently officially supported.

While there have been successes with using other tools like poetry or pip-tools, they do not share the same workflow as pip- especially when it comes to constraint vs. requirements management. Installing via Poetryor pip-toolsis not currently supported.

If you wish to install Airflow using those tools you should use the constraint files and convert them to appropriate format and workflow that your tool requires.

The installation of Airflow is painless if you are following the instructions below. Airflow uses constraint files to enable reproducible installation, so using pipand constraint files is recommended.

Airflow needs a home. ~/airflow is the default, but you can put it# somewhere else if you prefer (optional)export AIRFLOW_HOME=~/airflow

Install Airflow using the constraints fileAIRFLOW_VERSION=2.5.1

PYTHON_VERSION=”$(python –version cut -d “ “ -f 2 cut -d “.” -f 1-2)”# For example: 3.7CONSTRAINT_URL=”https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt”# For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.5.1/constraints-3.7.txtpip install “apache-airflow==${AIRFLOW_VERSION}” –constraint “${CONSTRAINT_URL}”# The Standalone command will initialise the database, make a user,# and start all components for you.airflow standalone

Visit localhost:8080 in the browser and use the admin account details# shown on the terminal to login.# Enable the example_bash_operator dag in the home page

Upon running these commands, Airflow will create the $AIRFLOW_HOMEfolder and create the “airflow.cfg” file with defaults that will get you going fast. You can override defaults using environment variables, see Configuration Reference. You can inspect the file either in $AIRFLOW_HOME/airflow.cfg, or through the UI in the Admin->Configurationmenu. The PID file for the webserver will be stored in $AIRFLOW_HOME/airflow-webserver.pidor in /run/airflow/webserver.pidif started by systemd. Out of the box, Airflow uses a SQLite database, which you should outgrow fairly quickly since no parallelization is possible using this database backend. It works in conjunction with the SequentialExecutorwhich will only run task instances sequentially. While this is very limiting, it allows you to get up and running quickly and take a tour of the UI and the command line utilities.

As you grow and deploy Airflow to production, you will also want to move away from the standalonecommand we use here to running the components separately. You can read more in Production Deployment.

Here are a few commands that will trigger a few task instances. You should be able to see the status of the jobs change in the example_bash_operatorDAG as you run the commands below.

If you want to run the individual parts of Airflow manually rather than using the all-in-one standalonecommand, you can instead run: airflow db init airflow users create \ –username admin \ –firstname Peter \ –lastname Parker \ –role Admin \ –email spiderman@superhero.org airflow webserver –port 8080airflow scheduler