Deployment Overview of Apache Spark on Server¶
Prerequisites and Basic Requirements¶
The server must meet the following requirements to successfully host the Apache Spark application:
-
Operating System: Debian (Bookworm) or Ubuntu (Bullseye).
-
Privileges: Root or sudo access is required to install packages, configure environment variables, and manage Docker containers.
-
Java Environment: The system requires OpenJDK. The version is automatically detected based on the distribution:
-
Ubuntu (Bullseye): Java 11 (
/usr/lib/jvm/java-11-openjdk-amd64) -
Debian (Bookworm): Java 17 (
/usr/lib/jvm/java-17-openjdk-amd64) -
Network: The server must have outbound internet access to download Apache Spark binaries and Docker images.
FQDN of the Final Panel¶
The application is accessible via the hostkey.in domain using the following Fully Qualified Domain Name (FQDN) structure:
spark<Server ID>.hostkey.in:443
Where <Server ID> is the specific identifier assigned to the server instance. The application listens on port 443 (HTTPS).
File and Directory Structure¶
The following directories and paths contain the application binaries, configurations, and data:
-
Application Binary: Extracted to
/root/spark-3.5.3-bin-hadoop3(inferred from download task). -
Nginx Configuration Directory:
/root/nginx/ -
Docker Compose File:
/root/nginx/compose.yml -
Nginx User Configuration:
/data/nginx/user_conf.d/spark<Server ID>.hostkey.in.conf -
SSL Certificates:
/etc/letsencrypt/live/spark<Server ID>.hostkey.in/ -
Nginx Environment File:
/data/nginx/nginx-certbot.env
Application Installation Process¶
Apache Spark is installed by downloading a specific binary distribution and extracting it on the host server. The process involves:
-
Updating and upgrading APT packages.
-
Installing the
default-jdkpackage. -
Configuring environment variables in
/etc/environment:-
JAVA_HOME: Set to the appropriate OpenJDK path. -
SPARK_LOCAL_IP: Set to127.0.0.1.
-
-
Downloading Apache Spark version 3.5.3 (binary for Hadoop 3) from the Apache archive.
-
Extracting the archive to the root directory.
-
Removing the original archive file.
-
Rebooting the system to apply environment variable changes.
Docker Containers and Their Deployment¶
The application utilizes Docker for the web server and SSL certificate management. The deployment is managed via a Docker Compose file located at /root/nginx/compose.yml.
Container Details:
-
Image:
jonasal/nginx-certbot:latest -
Restart Policy:
unless-stopped -
Network Mode:
host -
Volumes:
-
nginx_secretsmounted to/etc/letsencrypt(external volume). -
/data/nginx/user_conf.dmounted to/etc/nginx/user_conf.d.
Deployment Command: To start the container stack, execute the following command from the /root/nginx directory:
Proxy Servers¶
The application is fronted by an Nginx proxy container that handles SSL termination and routing to the internal Spark services.
Proxy Configuration:
-
Software: Nginx (via Docker container
jonasal/nginx-certbot). -
SSL: Enabled using Let's Encrypt certificates managed by Certbot.
-
Port: Listens on port 443 (HTTPS).
-
Domain: Configured to respond to
spark<Server ID>.hostkey.in. -
Routing:
-
Root path
/forwards to the Spark UI on internal port4040. -
/masterforwards to internal port8080. -
/workerforwards to internal port8081. -
/historyforwards to internal port18080. -
WebSocket Support: Configured with
Upgradeheaders andX-Schemepassing.
Permission Settings¶
The file permissions for the configuration and data directories are set as follows:
-
Nginx Config Directory (
/root/nginx): -
Owner:
root -
Group:
root -
Mode:
0644 -
User Conf File (
/data/nginx/user_conf.d/*.conf): -
Owner:
root -
Group:
root -
Mode:
0644 -
Docker Compose File (
/root/nginx/compose.yml): -
Owner:
root -
Group:
root -
Mode:
0644
Location of Configuration Files and Data¶
Key configuration files and their locations are:
-
Global Environment Variables:
/etc/environment -
Docker Compose Definition:
/root/nginx/compose.yml -
Nginx Server Block Configuration:
/data/nginx/user_conf.d/spark<Server ID>.hostkey.in.conf -
Nginx Environment Variables:
/data/nginx/nginx-certbot.env
Available Ports for Connection¶
The following ports are configured for the system:
| Port | Protocol | Description |
|---|---|---|
| 443 | TCP | HTTPS (Nginx Proxy Entry Point) |
| 4040 | TCP | Internal Spark UI (proxied via /) |
| 8080 | TCP | Internal Spark Master UI (proxied via /master) |
| 8081 | TCP | Internal Spark Worker UI (proxied via /worker) |
| 18080 | TCP | Internal Spark History Server (proxied via /history) |
Note: The internal ports (4040, 8080, 8081, 18080) are not directly exposed to the internet; they are accessed through the Nginx proxy on port 443.
Starting, Stopping, and Updating¶
The primary service managing the proxy and SSL is the Docker container. Use the following commands to manage its lifecycle:
-
Start the service:
-
Stop the service:
-
Restart the service:
-
Update the container image: