Skip to content

Deployment Overview of Apache Airflow on Server

Prerequisites and Basic Requirements

The deployment requires a Linux server running either Debian (Bookworm) or Ubuntu with the following specifications:

  • Root privileges or sudo access.

  • A domain name associated with the hostkey.in zone.

  • The server must have internet access to download packages and certificates.

  • The following system packages are installed as dependencies:

  • apt-utils, ca-certificates, curl, dumb-init, freetds-bin, krb5-user.

  • libgeos-dev, ldap-utils, libsasl2-2, libsasl2-modules, libxmlsec1.

  • locales, libffi-dev, libldap-2.5-0, libssl-dev, netcat-openbsd.

  • lsb-release, openssh-client, python3-selinux, rsync, sasl2-bin.

  • sqlite3, sudo, unixodbc, pipx, python3-pip, postgresql, findutils.

  • Python version 3.10 is required.

  • Apache Airflow version 2.10.1 is installed with the celery extra.

Application Access URL

The application is accessible via the hostkey.in domain on port 443. The fully qualified domain name (FQDN) follows the format: airflow<Server ID>.hostkey.in:443 The external path is set to /.

File and Directory Structure

The following directories and files are utilized by the deployment:

  • /root/nginx: Contains the Nginx configuration and Docker Compose files.

  • /root/nginx/compose.yml: The Docker Compose configuration for the proxy and SSL.

  • /data/nginx/user_conf.d: Directory for user-defined Nginx configurations.

  • /data/nginx/nginx-certbot.env: Environment file for the Certbot/Nginx container.

  • /etc/systemd/system/airflow-webserver.service: Systemd unit file for the webserver.

  • /etc/systemd/system/airflow-scheduler.service: Systemd unit file for the scheduler.

  • /opt/pipx: Default home for pipx on Debian Bookworm installations.

  • /usr/local/bin: Location for installed binaries on non-Bookworm systems.

Application Installation Process

Apache Airflow is installed using Python package managers depending on the operating system release:

  • On Debian Bookworm: Installed via pipx with the command /usr/bin/pipx install "apache-airflow[celery]==2.10.1" --include-deps.

  • On other Debian/Ubuntu versions: Installed via pip with the command pip install "apache-airflow[celery]==2.10.1" --constraint ... --ignore-installed --break-system-packages.

  • The database is initialized using the airflow db migrate command.

  • An administrative user is created with the following credentials:

  • Username: admin

  • First Name: admin

  • Last Name: admin

  • Role: Admin

  • Email: [email protected]

  • Password: Set to the SSH password used during the deployment process.

Databases

The application uses PostgreSQL as its backend database:

  • The postgresql package is installed locally on the server.

  • Airflow connects to the local PostgreSQL instance to store metadata and workflow state.

  • No external database connection strings or remote storage locations are configured in the provided files.

Docker Containers and Their Deployment

Docker is used to deploy the reverse proxy and SSL management components. The deployment is managed via a Docker Compose file located at /root/nginx/compose.yml.

Nginx and Certbot Service

  • Image: jonasal/nginx-certbot:latest

  • Network Mode: host

  • Restart Policy: unless-stopped

  • Environment Variables:

  • CERTBOT_EMAIL: Set to [email protected]

  • Configuration loaded from /data/nginx/nginx-certbot.env.

  • Volumes:

  • nginx_secrets: Mapped to /etc/letsencrypt for SSL certificate storage.

  • /data/nginx/user_conf.d: Mapped to /etc/nginx/user_conf.d for custom Nginx configurations.

The container is started using the command docker compose up -d executed from the /root/nginx directory.

Proxy Servers

Nginx is utilized as a reverse proxy to handle SSL termination and route traffic to the internal Airflow application.

  • Protocol: HTTPS

  • External Port: 443

  • Internal Port: 8080

  • Internal Path: Empty (root path)

  • External Path: /

  • SSL Management: Handled automatically by the nginx-certbot container using Let's Encrypt.

Starting, Stopping, and Updating

Airflow components are managed as systemd services. The following commands are used to control the services:

Airflow Webserver

  • Start: systemctl start airflow-webserver

  • Stop: systemctl stop airflow-webserver

  • Restart: systemctl restart airflow-webserver

  • Enable on Boot: systemctl enable airflow-webserver

Airflow Scheduler

  • Start: systemctl start airflow-scheduler

  • Stop: systemctl stop airflow-scheduler

  • Restart: systemctl restart airflow-scheduler

  • Enable on Boot: systemctl enable airflow-scheduler

Docker Proxy Services

  • Start: Execute docker compose up -d from /root/nginx.

  • Stop: Execute docker compose down from /root/nginx.

  • Update: Pull the latest image with docker compose pull and restart with docker compose up -d.

Available Ports for Connection

The following ports are exposed for external and internal communication:

  • 443: HTTPS (External access via Nginx proxy).

  • 8080: HTTP (Internal access to Airflow webserver, bound to localhost or internal network depending on service configuration).

question_mark
Is there anything I can help you with?
question_mark
AI Assistant ×