Deployment Overview of gpt-oss on Server¶

Prerequisites and Basic Requirements¶

The deployment requires a Linux server running Ubuntu with root privileges. The system must have Docker installed and configured to support GPU acceleration for the AI model. The following components are required:

Operating System: Ubuntu
Privileges: Root access (sudo)
Network: Access to the internet for downloading models and certificates
Hardware: GPU support for CUDA acceleration

FQDN of the Final Panel¶

The application is accessible via the following Fully Qualified Domain Name (FQDN) format:

gpt-oss<Server ID>.hostkey.in:443

Replace <Server ID> with the specific identifier assigned to the server instance.

File and Directory Structure¶

The deployment utilizes the following directory structure for configuration, data, and certificates:

/root/nginx/: Contains the Docker Compose configuration for the proxy and SSL management.
/data/nginx/user_conf.d/: Stores custom Nginx configuration files for the specific domain.
/data/nginx/nginx-certbot.env: Environment file for the Nginx-Certbot service.
/etc/systemd/system/ollama.service: Systemd service file for the Ollama backend.
/usr/share/ollama/.ollama/models/: Storage location for the downloaded AI models.
/var/lib/docker/volumes/open-webui/: Persistent storage volume for the Open WebUI application data.

Application Installation Process¶

The application consists of a backend AI engine (Ollama) and a frontend interface (Open WebUI). The installation involves the following steps:

Install Ollama: The Ollama package is installed using the official installation script.
Configure Ollama Service: The ollama.service file is modified to set the following environment variables:
- OLLAMA_HOST=0.0.0.0
- OLLAMA_ORIGINS=*
- LLAMA_FLASH_ATTENTION=1
Download Model: The gpt-oss:20b model is pulled and stored locally.
Deploy Open WebUI: The frontend is deployed as a Docker container using the ghcr.io/open-webui/open-webui:cuda image.

Docker Containers and Their Deployment¶

Two primary Docker containers are deployed to run the application stack:

Open WebUI Container:
- Image: ghcr.io/open-webui/open-webui:cuda
- Name: open-webui
- Ports: Maps host port 8080 to container port 8080.
- Environment Variables:
  - ENV=dev
  - OLLAMA_BASE_URLS=http://host.docker.internal:11434
- Volumes: Mounts the open-webui volume to /app/backend/data.
- GPU: Configured with --gpus all for CUDA support.
- Restart Policy: Set to always.
Nginx-Certbot Container:
- Image: jonasal/nginx-certbot:latest
- Network Mode: Host
- Volumes:
  - nginx_secrets mounted to /etc/letsencrypt.
  - /data/nginx/user_conf.d mounted to /etc/nginx/user_conf.d.
- Environment: Uses [email protected] and loads variables from /data/nginx/nginx-certbot.env.

Proxy Servers¶

The application uses Nginx as a reverse proxy with SSL termination managed by Certbot.

Proxy Configuration: The Nginx configuration file located at /data/nginx/user_conf.d/gpt-oss<Server ID>.hostkey.in.conf directs traffic to the internal application.
Proxy Pass: Traffic is forwarded from the external port to the internal service using the rule: proxy_pass http://127.0.0.1:8080;.
SSL: Managed automatically by the nginx-certbot container to ensure HTTPS connectivity on port 443.

Available Ports for Connection¶

The following ports are configured for the application:

Port 443: External HTTPS access via the Nginx proxy.
Port 8080: Internal HTTP access for the Open WebUI container.
Port 11434: Internal access for the Ollama service (accessible via host.docker.internal from within the container).

Starting, Stopping, and Updating¶

Service management is handled through Docker and Systemd commands:

Open WebUI Container:
- Start: docker start open-webui
- Stop: docker stop open-webui
- Restart: docker restart open-webui
- Update: Pull the latest image and recreate the container.
Ollama Service:
- Start: systemctl start ollama
- Stop: systemctl stop ollama
- Restart: systemctl restart ollama
- Enable on Boot: systemctl enable ollama
Nginx Proxy:
- Start/Update: docker compose up -d executed from the /root/nginx directory.
- Stop: docker compose down executed from the /root/nginx directory.