Skip to content

Deployment Overview of Apache Spark on Server

Prerequisites and Basic Requirements

The server must meet the following requirements to successfully host the Apache Spark application:

  • Operating System: Debian (Bookworm) or Ubuntu (Bullseye).

  • Privileges: Root or sudo access is required to install packages, configure environment variables, and manage Docker containers.

  • Java Environment: The system requires OpenJDK. The version is automatically detected based on the distribution:

  • Ubuntu (Bullseye): Java 11 (/usr/lib/jvm/java-11-openjdk-amd64)

  • Debian (Bookworm): Java 17 (/usr/lib/jvm/java-17-openjdk-amd64)

  • Network: The server must have outbound internet access to download Apache Spark binaries and Docker images.

FQDN of the Final Panel

The application is accessible via the hostkey.in domain using the following Fully Qualified Domain Name (FQDN) structure:

spark<Server ID>.hostkey.in:443

Where <Server ID> is the specific identifier assigned to the server instance. The application listens on port 443 (HTTPS).

File and Directory Structure

The following directories and paths contain the application binaries, configurations, and data:

  • Application Binary: Extracted to /root/spark-3.5.3-bin-hadoop3 (inferred from download task).

  • Nginx Configuration Directory: /root/nginx/

  • Docker Compose File: /root/nginx/compose.yml

  • Nginx User Configuration: /data/nginx/user_conf.d/spark<Server ID>.hostkey.in.conf

  • SSL Certificates: /etc/letsencrypt/live/spark<Server ID>.hostkey.in/

  • Nginx Environment File: /data/nginx/nginx-certbot.env

Application Installation Process

Apache Spark is installed by downloading a specific binary distribution and extracting it on the host server. The process involves:

  1. Updating and upgrading APT packages.

  2. Installing the default-jdk package.

  3. Configuring environment variables in /etc/environment:

    • JAVA_HOME: Set to the appropriate OpenJDK path.

    • SPARK_LOCAL_IP: Set to 127.0.0.1.

  4. Downloading Apache Spark version 3.5.3 (binary for Hadoop 3) from the Apache archive.

  5. Extracting the archive to the root directory.

  6. Removing the original archive file.

  7. Rebooting the system to apply environment variable changes.

Docker Containers and Their Deployment

The application utilizes Docker for the web server and SSL certificate management. The deployment is managed via a Docker Compose file located at /root/nginx/compose.yml.

Container Details:

  • Image: jonasal/nginx-certbot:latest

  • Restart Policy: unless-stopped

  • Network Mode: host

  • Volumes:

  • nginx_secrets mounted to /etc/letsencrypt (external volume).

  • /data/nginx/user_conf.d mounted to /etc/nginx/user_conf.d.

Deployment Command: To start the container stack, execute the following command from the /root/nginx directory:

docker compose up -d

Proxy Servers

The application is fronted by an Nginx proxy container that handles SSL termination and routing to the internal Spark services.

Proxy Configuration:

  • Software: Nginx (via Docker container jonasal/nginx-certbot).

  • SSL: Enabled using Let's Encrypt certificates managed by Certbot.

  • Port: Listens on port 443 (HTTPS).

  • Domain: Configured to respond to spark<Server ID>.hostkey.in.

  • Routing:

  • Root path / forwards to the Spark UI on internal port 4040.

  • /master forwards to internal port 8080.

  • /worker forwards to internal port 8081.

  • /history forwards to internal port 18080.

  • WebSocket Support: Configured with Upgrade headers and X-Scheme passing.

Permission Settings

The file permissions for the configuration and data directories are set as follows:

  • Nginx Config Directory (/root/nginx):

  • Owner: root

  • Group: root

  • Mode: 0644

  • User Conf File (/data/nginx/user_conf.d/*.conf):

  • Owner: root

  • Group: root

  • Mode: 0644

  • Docker Compose File (/root/nginx/compose.yml):

  • Owner: root

  • Group: root

  • Mode: 0644

Location of Configuration Files and Data

Key configuration files and their locations are:

  • Global Environment Variables: /etc/environment

  • Docker Compose Definition: /root/nginx/compose.yml

  • Nginx Server Block Configuration: /data/nginx/user_conf.d/spark<Server ID>.hostkey.in.conf

  • Nginx Environment Variables: /data/nginx/nginx-certbot.env

Available Ports for Connection

The following ports are configured for the system:

Port Protocol Description
443 TCP HTTPS (Nginx Proxy Entry Point)
4040 TCP Internal Spark UI (proxied via /)
8080 TCP Internal Spark Master UI (proxied via /master)
8081 TCP Internal Spark Worker UI (proxied via /worker)
18080 TCP Internal Spark History Server (proxied via /history)

Note: The internal ports (4040, 8080, 8081, 18080) are not directly exposed to the internet; they are accessed through the Nginx proxy on port 443.

Starting, Stopping, and Updating

The primary service managing the proxy and SSL is the Docker container. Use the following commands to manage its lifecycle:

  • Start the service:

    cd /root/nginx && docker compose up -d
    

  • Stop the service:

    cd /root/nginx && docker compose down
    

  • Restart the service:

    cd /root/nginx && docker compose restart
    

  • Update the container image:

    cd /root/nginx && docker compose pull && docker compose up -d
    

question_mark
Is there anything I can help you with?
question_mark
AI Assistant ×