gpt-oss-20b¶
In this article
Information
gpt-oss-20b is a medium-sized open-weight model from OpenAI, designed for efficient operation with low latency in local deployments or specialized use cases. The model has 20 billion parameters, with 3.6 billion active, enabling it to run on resource-constrained hardware, such as devices with 16GB of memory. It can be deployed locally, including on consumer-grade hardware..
Main Features of gpt-oss-20b¶
- Optimized architecture: The gpt-oss-20b model has 20 billion total parameters and activates only 3.6 billion, ensuring high performance while efficiently using resources.
- Extended agent capabilities: The model includes built-in abilities for function calling, web page viewing, Python code execution, and generating structured outputs. It excels at solving tasks and invoking tools.
- Reasoning with adjustable intensity levels: Functionally the model is a reliable task solver supporting methods like chain-of-thought reasoning. It offers three customizable levels of reasoning intensity.
- Performance and compatibility: The gpt-oss-20b model shows results comparable to OpenAI o3-mini on common benchmarks. Thanks to optimization, it can operate on edge devices with 16 GB of memory.
- Multilingual support: The model has multilingual functionality. For optimal results, it is recommended to explicitly specify the target language and cultural context for interaction.
- Data quantization: Support for the MXFP4 format ensures efficient operation of the model on resource-limited hardware, enhancing overall system performance.
Deployment Features¶
Technical specifications of the build:
- Ubuntu 22.04 with kernel updated to version 6;
- Latest Nvidia drivers;
- CUDA Toolkit;
- Ollama for managing models;
- OpenWebUI for web interface.
Installation features:
- Installation time is 25-45 minutes including OS setup;
- The Ollama server loads and runs the gpt-oss-20b model in GPU/RAM memory;
- Open WebUI is deployed as a web application connected to the Ollama server;
- Users interact with the model through the Open WebUI web interface for programming and agent tasks;
- All computations and code processing occur locally on the server;
- Administrators can configure the model for specific development tasks using OpenWebUI tools;
- Support for various quantization levels to optimize memory usage.
Getting Started After Deploying gpt-oss-20b¶
After payment, a notification about server readiness will be sent to the email registered during the order. It will include the VPS IP address, login, and password for server access, as well as a link to the OpenWebUI control panel. Clients manage equipment through the Server Management Panel and API — Invapi.
- Credentials for OS server access (e.g., via SSH) will be sent in the received email.
- Link to Ollama control panel with Open WebUI web interface: in the webpanel tag in the Info >> Tags tab of Invapi's control panel. The exact link, e.g.,
https://gpt-oss<Server_ID_from_Invapi>.hostkey.in
, is provided in the email sent upon server delivery.
Upon first visiting the webpanel tag link, a welcome page will open. Click the Get started button to begin setup.
After clicking the link from the tag webpanel, a Get started with Open WebUI login window will open, where you need to create an admin account name, email, and password for your chatbot, then press the Create Admin Account button:
Attention
After registering the first user, the system automatically assigns them an administrator role. To ensure security and control over the registration process, all subsequent registration requests must be approved in OpenWebUI from the administrator account.
Following successful registration, the main Open WebUI interface with access to Gpt-oss-20b will open:
Note
Detailed information on using the Ollama control panel with Open WebUI can be found in the article AI Chatbot on Your Own Server.
Note
For optimal operation with the gpt-oss-20b model, it is recommended to use a GPU with at least 16 GB of video memory for the 20B model. For efficient processing of long code contexts and complex agent tasks, we recommend using GPUs with 24 GB of video memory. Detailed information on main Ollama settings and Open WebUI can be found in the Ollama developer documentation and in the Open WebUI developer documentation.
Recommendations for Use
To maximize gpt-oss 20B model efficiency, it is recommended to:
- Use the model for reasoning tasks, including chain-of-thought processing. The model supports adjustable levels of reasoning: low, medium, and high, which are configured through a system prompt.
- Utilize the model's built-in agent capabilities such as function calling, Python code execution, and structured outputs.
- Employ the model for multi-stage development tasks leveraging its agent abilities.
- Integrate the model with existing development tools via API, considering it supports tuning and operates in OpenAI Harmony response format. The model is designed for efficient deployment with low latency, including locally.
Ordering a Server with gpt-oss-20b Using API¶
To install this software using the API, follow these instructions.
Some of the content on this page was created or translated using AI.