Skip to content

Deployment Overview of Apache Airflow on Server

Prerequisites and Basic Requirements

The deployment requires a Debian-based operating system, specifically Debian Bookworm or Ubuntu. The system must have root privileges to install packages and configure services. The following system dependencies are required for the environment:

  • apt-utils
  • ca-certificates
  • curl
  • dumb-init
  • freetds-bin
  • krb5-user
  • libgeos-dev
  • ldap-utils
  • libsasl2-2
  • libsasl2-modules
  • libxmlsec1
  • locales
  • libffi-dev
  • libldap-2.5-0
  • libssl-dev
  • netcat-openbsd
  • lsb-release
  • openssh-client
  • python3-selinux
  • rsync
  • sasl2-bin
  • sqlite3
  • sudo
  • unixodbc
  • pipx
  • python3-pip
  • postgresql
  • findutils

The application utilizes port 8080 for the webserver interface.

File and Directory Structure

Configuration and data files are organized in the following locations:

  • Systemd Service Files: Located in /etc/systemd/system/.
  • airflow-webserver.service
  • airflow-scheduler.service
  • Nginx Configuration: Located in /root/nginx/.
  • compose.yml
  • Nginx Data and Secrets:
  • Let's Encrypt secrets are stored in the nginx_secrets volume mounted at /etc/letsencrypt.
  • User configuration files are stored in /data/nginx/user_conf.d.
  • Environment Variables: Nginx environment variables are defined in /data/nginx/nginx-certbot.env.
  • Pipx Installation: If using Debian Bookworm, the pipx home directory is set to /opt/pipx and binaries are installed to /usr/local/bin.

Application Installation Process

Apache Airflow is installed using Python package managers. The installation method depends on the operating system version:

  • Debian Bookworm: Airflow is installed using pipx with the celery extra. The command executed is:

    /usr/bin/pipx install "apache-airflow[celery]=={{ airflow_version }}" --include-deps
    
    The environment variables PIPX_HOME and PIPX_BIN_DIR are configured to /opt/pipx and /usr/local/bin respectively.

  • Ubuntu or Debian Bullseye: Airflow is installed using pip with specific constraints and flags to handle system packages:

    pip install "apache-airflow[celery]=={{ airflow_version }}" \
    --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-{{ airflow_version }}/constraints-3.10.txt" \
    --ignore-installed --break-system-packages
    

After installation, the Airflow binary location is detected using the find command across /usr/local/bin, /usr/bin, and /root/.local/bin. The database is initialized by running:

airflow db migrate

An administrative user is created with the following parameters: - Username: admin - First Name: admin - Last Name: admin - Role: Admin - Email: [email protected] - Password: Set to the SSH password used for the server connection.

Docker Containers and Their Deployment

Docker is installed on the server to manage the reverse proxy and SSL certificate management. The deployment uses Docker Compose to orchestrate the Nginx container.

The compose.yml file is generated in /root/nginx/ and defines the following service:

  • Service Name: nginx
  • Image: jonasal/nginx-certbot:latest
  • Restart Policy: unless-stopped
  • Network Mode: host
  • Environment:
  • CERTBOT_EMAIL: Set to [email protected]
  • Additional variables are loaded from /data/nginx/nginx-certbot.env
  • Volumes:
  • nginx_secrets (external volume) mounted to /etc/letsencrypt
  • /data/nginx/user_conf.d mounted to /etc/nginx/user_conf.d

The container is started using the command:

docker compose up -d
executed from the /root/nginx directory.

Proxy Servers

The server utilizes Nginx as a reverse proxy with integrated Let's Encrypt certificate management via the jonasal/nginx-certbot container.

  • Domain Configuration: Custom domains and SSL certificates are managed through the Nginx container.
  • Certificate Management: Certificates are requested and renewed automatically by the container using the email [email protected].
  • Configuration Path: Custom Nginx configurations are placed in /data/nginx/user_conf.d and mounted into the container.
  • Secrets Storage: SSL certificates and keys are stored in the external Docker volume nginx_secrets at /etc/letsencrypt.

Starting, Stopping, and Updating

Apache Airflow services are managed via systemd. The following services are configured:

  • airflow-webserver: Runs the web interface on port 8080.
  • airflow-scheduler: Runs the background scheduler.

To manage these services, use the following commands:

  • Start Services:
    systemctl start airflow-webserver
    systemctl start airflow-scheduler
    
  • Stop Services:
    systemctl stop airflow-webserver
    systemctl stop airflow-scheduler
    
  • Enable Services on Boot:
    systemctl enable airflow-webserver
    systemctl enable airflow-scheduler
    
  • Reload Systemd Daemon:
    systemctl daemon-reload
    
  • Check Service Status:
    systemctl status airflow-webserver
    systemctl status airflow-scheduler
    

The Docker-based Nginx proxy is managed via Docker Compose commands in the /root/nginx directory: - Start/Restart: docker compose up -d - Stop: docker compose down

question_mark
Is there anything I can help you with?
question_mark
AI Assistant ×