Deployment Overview of Apache Airflow on Server¶
Prerequisites and Basic Requirements¶
The deployment requires a Linux server running either Debian (Bookworm) or Ubuntu with the following specifications:
-
Root privileges or
sudoaccess. -
A domain name associated with the
hostkey.inzone. -
The server must have internet access to download packages and certificates.
-
The following system packages are installed as dependencies:
-
apt-utils,ca-certificates,curl,dumb-init,freetds-bin,krb5-user. -
libgeos-dev,ldap-utils,libsasl2-2,libsasl2-modules,libxmlsec1. -
locales,libffi-dev,libldap-2.5-0,libssl-dev,netcat-openbsd. -
lsb-release,openssh-client,python3-selinux,rsync,sasl2-bin. -
sqlite3,sudo,unixodbc,pipx,python3-pip,postgresql,findutils. -
Python version 3.10 is required.
-
Apache Airflow version 2.10.1 is installed with the
celeryextra.
Application Access URL¶
The application is accessible via the hostkey.in domain on port 443. The fully qualified domain name (FQDN) follows the format: airflow<Server ID>.hostkey.in:443 The external path is set to /.
File and Directory Structure¶
The following directories and files are utilized by the deployment:
-
/root/nginx: Contains the Nginx configuration and Docker Compose files. -
/root/nginx/compose.yml: The Docker Compose configuration for the proxy and SSL. -
/data/nginx/user_conf.d: Directory for user-defined Nginx configurations. -
/data/nginx/nginx-certbot.env: Environment file for the Certbot/Nginx container. -
/etc/systemd/system/airflow-webserver.service: Systemd unit file for the webserver. -
/etc/systemd/system/airflow-scheduler.service: Systemd unit file for the scheduler. -
/opt/pipx: Default home forpipxon Debian Bookworm installations. -
/usr/local/bin: Location for installed binaries on non-Bookworm systems.
Application Installation Process¶
Apache Airflow is installed using Python package managers depending on the operating system release:
-
On Debian Bookworm: Installed via
pipxwith the command/usr/bin/pipx install "apache-airflow[celery]==2.10.1" --include-deps. -
On other Debian/Ubuntu versions: Installed via
pipwith the commandpip install "apache-airflow[celery]==2.10.1" --constraint ... --ignore-installed --break-system-packages. -
The database is initialized using the
airflow db migratecommand. -
An administrative user is created with the following credentials:
-
Username:
admin -
First Name:
admin -
Last Name:
admin -
Role:
Admin -
Email:
[email protected] -
Password: Set to the SSH password used during the deployment process.
Databases¶
The application uses PostgreSQL as its backend database:
-
The
postgresqlpackage is installed locally on the server. -
Airflow connects to the local PostgreSQL instance to store metadata and workflow state.
-
No external database connection strings or remote storage locations are configured in the provided files.
Docker Containers and Their Deployment¶
Docker is used to deploy the reverse proxy and SSL management components. The deployment is managed via a Docker Compose file located at /root/nginx/compose.yml.
Nginx and Certbot Service¶
-
Image:
jonasal/nginx-certbot:latest -
Network Mode:
host -
Restart Policy:
unless-stopped -
Environment Variables:
-
CERTBOT_EMAIL: Set to[email protected] -
Configuration loaded from
/data/nginx/nginx-certbot.env. -
Volumes:
-
nginx_secrets: Mapped to/etc/letsencryptfor SSL certificate storage. -
/data/nginx/user_conf.d: Mapped to/etc/nginx/user_conf.dfor custom Nginx configurations.
The container is started using the command docker compose up -d executed from the /root/nginx directory.
Proxy Servers¶
Nginx is utilized as a reverse proxy to handle SSL termination and route traffic to the internal Airflow application.
-
Protocol: HTTPS
-
External Port: 443
-
Internal Port: 8080
-
Internal Path: Empty (root path)
-
External Path:
/ -
SSL Management: Handled automatically by the
nginx-certbotcontainer using Let's Encrypt.
Starting, Stopping, and Updating¶
Airflow components are managed as systemd services. The following commands are used to control the services:
Airflow Webserver¶
-
Start:
systemctl start airflow-webserver -
Stop:
systemctl stop airflow-webserver -
Restart:
systemctl restart airflow-webserver -
Enable on Boot:
systemctl enable airflow-webserver
Airflow Scheduler¶
-
Start:
systemctl start airflow-scheduler -
Stop:
systemctl stop airflow-scheduler -
Restart:
systemctl restart airflow-scheduler -
Enable on Boot:
systemctl enable airflow-scheduler
Docker Proxy Services¶
-
Start: Execute
docker compose up -dfrom/root/nginx. -
Stop: Execute
docker compose downfrom/root/nginx. -
Update: Pull the latest image with
docker compose pulland restart withdocker compose up -d.
Available Ports for Connection¶
The following ports are exposed for external and internal communication:
-
443: HTTPS (External access via Nginx proxy).
-
8080: HTTP (Internal access to Airflow webserver, bound to localhost or internal network depending on service configuration).