Skip to content

Deployment Overview of gpt-oss on Server

Prerequisites and Basic Requirements

The deployment requires a Linux server running Ubuntu with root privileges. The system must have Docker installed and configured to support GPU acceleration for the AI model. The following components are required:

  • Operating System: Ubuntu

  • Privileges: Root access (sudo)

  • Network: Access to the internet for downloading models and certificates

  • Hardware: GPU support for CUDA acceleration

FQDN of the Final Panel

The application is accessible via the following Fully Qualified Domain Name (FQDN) format:

  • gpt-oss<Server ID>.hostkey.in:443

Replace <Server ID> with the specific identifier assigned to the server instance.

File and Directory Structure

The deployment utilizes the following directory structure for configuration, data, and certificates:

  • /root/nginx/: Contains the Docker Compose configuration for the proxy and SSL management.

  • /data/nginx/user_conf.d/: Stores custom Nginx configuration files for the specific domain.

  • /data/nginx/nginx-certbot.env: Environment file for the Nginx-Certbot service.

  • /etc/systemd/system/ollama.service: Systemd service file for the Ollama backend.

  • /usr/share/ollama/.ollama/models/: Storage location for the downloaded AI models.

  • /var/lib/docker/volumes/open-webui/: Persistent storage volume for the Open WebUI application data.

Application Installation Process

The application consists of a backend AI engine (Ollama) and a frontend interface (Open WebUI). The installation involves the following steps:

  1. Install Ollama: The Ollama package is installed using the official installation script.

  2. Configure Ollama Service: The ollama.service file is modified to set the following environment variables:

    • OLLAMA_HOST=0.0.0.0

    • OLLAMA_ORIGINS=*

    • LLAMA_FLASH_ATTENTION=1

  3. Download Model: The gpt-oss:20b model is pulled and stored locally.

  4. Deploy Open WebUI: The frontend is deployed as a Docker container using the ghcr.io/open-webui/open-webui:cuda image.

Docker Containers and Their Deployment

Two primary Docker containers are deployed to run the application stack:

  1. Open WebUI Container:

    • Image: ghcr.io/open-webui/open-webui:cuda

    • Name: open-webui

    • Ports: Maps host port 8080 to container port 8080.

    • Environment Variables:

      • ENV=dev

      • OLLAMA_BASE_URLS=http://host.docker.internal:11434

    • Volumes: Mounts the open-webui volume to /app/backend/data.

    • GPU: Configured with --gpus all for CUDA support.

    • Restart Policy: Set to always.

  2. Nginx-Certbot Container:

    • Image: jonasal/nginx-certbot:latest

    • Network Mode: Host

    • Volumes:

      • nginx_secrets mounted to /etc/letsencrypt.

      • /data/nginx/user_conf.d mounted to /etc/nginx/user_conf.d.

    • Environment: Uses [email protected] and loads variables from /data/nginx/nginx-certbot.env.

Proxy Servers

The application uses Nginx as a reverse proxy with SSL termination managed by Certbot.

  • Proxy Configuration: The Nginx configuration file located at /data/nginx/user_conf.d/gpt-oss<Server ID>.hostkey.in.conf directs traffic to the internal application.

  • Proxy Pass: Traffic is forwarded from the external port to the internal service using the rule: proxy_pass http://127.0.0.1:8080;.

  • SSL: Managed automatically by the nginx-certbot container to ensure HTTPS connectivity on port 443.

Available Ports for Connection

The following ports are configured for the application:

  • Port 443: External HTTPS access via the Nginx proxy.

  • Port 8080: Internal HTTP access for the Open WebUI container.

  • Port 11434: Internal access for the Ollama service (accessible via host.docker.internal from within the container).

Starting, Stopping, and Updating

Service management is handled through Docker and Systemd commands:

  • Open WebUI Container:

    • Start: docker start open-webui

    • Stop: docker stop open-webui

    • Restart: docker restart open-webui

    • Update: Pull the latest image and recreate the container.

  • Ollama Service:

    • Start: systemctl start ollama

    • Stop: systemctl stop ollama

    • Restart: systemctl restart ollama

    • Enable on Boot: systemctl enable ollama

  • Nginx Proxy:

    • Start/Update: docker compose up -d executed from the /root/nginx directory.

    • Stop: docker compose down executed from the /root/nginx directory.

question_mark
Is there anything I can help you with?
question_mark
AI Assistant ×