Skip to content

gpt-oss-120b

In this article

Information

gpt-oss-120b is a large-scale model with open weights from OpenAI, designed for high-performance tasks that require deep reasoning, multi-step planning, and complex interaction with tools. The model contains 120 billion parameters, of which approximately 21 billion are activated per pass, providing a balance between computational power and efficiency. Thanks to advanced quantization methods and optimization, gpt-oss-120b can be deployed on server hardware with 70 GB or more of video memory and supports scalable local or hybrid deployment.

Main Features of gpt-oss-120b

  • Scalable architecture with conditional activation: The model contains 120 billion parameters, but through the mechanism of sparse activation (sparse activation), it only activates approximately 21 billion parameters per request. This significantly reduces memory and computational resource requirements without compromising quality.
  • Advanced agent capabilities: gpt-oss-120b supports an extended set of tools, including code execution, real-time web search, API calling, and generation of strictly structured outputs (JSON, XML, etc.). This makes it an ideal foundation for autonomous agents and complex automated systems.
  • Adaptive reasoning: The model implements a flexible system of reasoning levels—from quick direct responses to multi-step chains of thought (chain-of-thought) and decision trees. Users can control the "depth of thinking" depending on the complexity of the task.
  • High performance on benchmarks: gpt-oss-120b demonstrates results comparable to proprietary models at the o3 and o4 levels, particularly in tasks requiring logic, mathematics, programming, and interdisciplinary synthesis of knowledge.
  • Extensive multilingual support: The model is trained on data from more than 50 languages and can effectively operate in multilingual and multicultural contexts. For best results, it is recommended to explicitly specify the language and cultural frameworks in the prompt.
  • Efficient quantization and compatibility: Support for MXFP4 and INT4 formats allows significant reduction of memory usage and acceleration of output without substantial loss of quality. The model is compatible with popular frameworks such as vLLM, GGUF, and Hugging Face Transformers.

Deployment Features

ID Compatible OS VM BM VGPU GPU Min CPU (Cores) Min RAM (Gb) Min HDD/SDD (Gb) Active
415 Ubuntu 22.04 GPU - - + + 16 128 240 ORDER

Technical specifications of the build:

  • Ubuntu 22.04 with kernel updated to version 6;
  • Latest Nvidia drivers;
  • CUDA Toolkit;
  • Ollama for managing models;
  • OpenWebUI for web interface.

Installation features:

  • Installation time is 35-45 minutes including OS setup;
  • The Ollama server loads and runs the gpt-oss-120b model in GPU/RAM memory;
  • Open WebUI is deployed as a web application connected to the Ollama server;
  • Users interact with the model through the Open WebUI web interface for programming and agent tasks;
  • All computations and code processing occur locally on the server;
  • Administrators can configure the model for specific development tasks using OpenWebUI tools;
  • Support for various quantization levels to optimize memory usage.

Getting Started After Deploying gpt-oss-120b

After payment, a notification about server readiness will be sent to the email registered during the order. It will include the VPS IP address, login, and password for server access, as well as a link to the OpenWebUI control panel. Clients manage equipment through the Server Management Panel and APIInvapi.

  • Credentials for OS server access (e.g., via SSH) will be sent in the received email.
  • Link to Ollama control panel with Open WebUI web interface: in the webpanel tag in the Info >> Tags tab of Invapi's control panel. The exact link, e.g., https://gpt-oss<Server_ID_from_Invapi>.hostkey.in, is provided in the email sent upon server delivery.

Upon first visiting the webpanel tag link, a welcome page will open. Click the Get started button to begin setup.

After clicking the link from the tag webpanel, a Get started with Open WebUI login window will open, where you need to create an admin account name, email, and password for your chatbot, then press the Create Admin Account button:

Attention

After registering the first user, the system automatically assigns them an administrator role. To ensure security and control over the registration process, all subsequent registration requests must be approved in OpenWebUI from the administrator account.

Following successful registration, the main Open WebUI interface with access to Gpt-oss-20b will open:

Note

Detailed information on using the Ollama control panel with Open WebUI can be found in the article AI Chatbot on Your Own Server.

Note

For optimal operation with the gpt-oss-120b model, it is recommended to use a GPU with at least 70 GB of video memory for the 120B model. For efficient processing of long code contexts and complex agent tasks, we recommend using GPUs with 80 GB of video memory. Detailed information on main Ollama settings and Open WebUI can be found in the Ollama developer documentation and in the Open WebUI developer documentation.

Recommendations for Use

To maximize gpt-oss 20B model efficiency, it is recommended to:

  • Use the model for reasoning tasks, including chain-of-thought processing. The model supports adjustable levels of reasoning: low, medium, and high, which are configured through a system prompt.
  • Utilize the model's built-in agent capabilities such as function calling, Python code execution, and structured outputs.
  • Employ the model for multi-stage development tasks leveraging its agent abilities.
  • Integrate the model with existing development tools via API, considering it supports tuning and operates in OpenAI Harmony response format. The model is designed for efficient deployment with low latency, including locally.

Ordering a Server with gpt-oss-120b Using API

To install this software using the API, follow these instructions.


Some of the content on this page was created or translated using AI.

question_mark
Is there anything I can help you with?
question_mark
AI Assistant ×