Skip to content

Self-hosted AI Chatbot

In this article


Self-hosted AI Chatbot constitutes a localized solution, amalgamating numerous open-source components. The nucleus of this system is Ollama - an architectural framework designed for launching and managing large-scale language models (LLMs) on local computational resources. It facilitates the downloading and deployment of selected LLMs. For seamless interaction with the deployed model, Open Web UI employs a graphical interface; this web application enables users to dispatch textual inquiries and receive responses generated by the language models. The integration of these components engenders a fully autonomous, localized solution for deploying cutting-edge language models with open-source codebases while maintain daggers full control over data integrity and system performance.

Key Features

  • Web Interface: Open Web UI provides an intuitive web interface that centralizes control and extends interaction capabilities with local AI language models from the Ollama repository, significantly simplifying model usage for users of varying proficiency levels.
  • Integration with Numerous Language Models: Ollama grants access to a plethora of free language models, thereby providing enhanced natural language processing (NLP) capabilities at your disposal. Additionally, you may integrate your customized models.
  • Tasks: Users can engage in conversations, acquire answers to queries, analyze data sets, perform translations, and develop their own chatbots or AI-powered applications with the assistance of LLMs.
  • Open Source Code: Ollama is an open-source project, enabling users to tailor and modify the platform according to their specific requirements.
  • Web Scraper and Internal Document Search (RAG): Through OpenWebUI, you can search across various document types such as text files, PDFs, PowerPoint presentations, websites, and YouTube videos.


For more information on Ollama's main settings and Open WebUI documentation, refer to Ollama developer documentation and Open WebUI documentation.

Deployment Features

  • Installation is possible on Ubuntu 22.04.
  • The combined installation time for the OS and server falls between 15 to 30 minutes.
  • Ollama Server downloads and launches LLM in memory, streamlining deployment processes.
  • Open WebUI operates as a web application that connects with the Ollama Server.
  • Users engage with LLaMA 3 via the Open WebUI's web interface by sending queries to receive responses.
  • All computations and data processing are executed locally on the server, ensuring privacy and control over information flow. System administrators have the flexibility to customize the LLM for bespoke tasks through the functionalities provided within OpenWebUI.
  • The system requirements stipulate a minimum of 16 GB RAM to ensure optimal performance.

Upon completion of the installation process, users are required to access their server by navigating through the URL: https://<Server_IP_from_Invapi>:3000.

Getting Started with Your Deployed AI Chatbot

Upon the completion of your order and payment process, a notification will be sent to the email address provided during registration, confirming that the server is ready for operation. This communication includes the VPS IP address and login credentials necessary for connection purposes. Our company's equipment management team utilizes our control panels for servers and APIs — specifically, Invapi.

Once you click the webpanel tag link, a login window will appear where you should enter the credentials found in either the Info >> Tags section of your server control panel or within the emailed message upon server handover.

The access details for logging into Ollama's Open WebUI web interface are as follows:

  • Login URL for accessing the management panel with Open WebUI and a web interface: Via the webpanel tag. Specific address in the format https://<Server_IP_from_Invapi>:3000 as indicated in the confirmation email upon handover.

Following this link, you'll need to create an identifier (username) and password within Open WebUI for user authentication purposes.


Upon the registration of the first user, the system automatically assigns them an administrator role. To ensure security and control over the registration process, all subsequent registration requests must be approved by an administrator using their account credentials.

OpenWebUI Initial Screen

The initial screen presents a chat interface along with several example input prompts (queries) to demonstrate the system's capabilities. To initiate interaction with the chatbot, users must select their preferred language model from available options. In this case, LLaMA 3 is recommended, which boasts extensive knowledge and capabilities for generating responses to various queries.

After selecting a model, users can enter their first query in the input field, and the system will generate a response based on the analysis of the entered text. The example prompts presented on the initial screen showcase the diversity of topics and tasks that the chatbot can handle, helping users orient themselves with its capabilities.

Configuring Your OpenWebUI Workspace

To further customize your chat experience, navigate to the Workspace >> Modefiles > Create a modelfile section. Here, you'll find several options for customization:

  • Modelfiles - in this tab, you can select alternative language models or upload custom models for use in your chat.
  • Prompts - here, you can create, edit, and manage your own prompts (input queries) to optimize your interaction with the chatbot.
  • Documents - this option allows you to upload documents of various formats (PDF, text files, etc.) for subsequent analysis and discussion with the chatbot.
  • Playground - this area is dedicated to experimentation, where you can test various settings and features in a safe environment before applying them to your main chat.

Adding and Removing Models

Ollama offers the ability to install and use a wide range of language models, not just the default one. Before installing new models, ensure that your server configuration meets the requirements of the chosen model regarding memory usage and computational resources.

Installing a Model via the OpenWebUI Interface

To install models through the OpenWebUI interface, follow these steps:

  1. Define the name of the model to be installed, as described in this article. The name will be located in the ollama run <model_name> command.

  2. Click on the model name in the top-left corner of the OpenWebUI chatbot window and insert the name into the Search a model field.

  3. Select the option Pull from

  4. After a successful download and installation, the model will appear in the dropdown list and become available for selection.

Installing a Model via Command Line

To install new models, you need to connect to your server using SSH and execute the corresponding command as described in this article.

Removing a Model

To remove models from the OpenWebUI interface, navigate to the settings by clicking on the Settings icon in the top-right corner of the window and selecting Model. In the dropdown list, select the model you want to delete and click the icon next to it.

To remove a model using the command line (as root):

ollama rm <model_name>

Adding Documents

The Documents option allows you to upload documents in various formats, such as PDFs, text files, Word documents, PowerPoint presentations, and others, for subsequent analysis and discussion with the chatbot. This is particularly useful for studying complex documents, preparing for presentations or meetings, analyzing data and reports, checking written works for grammar errors, style, and logic, working with legal and financial documents, as well as research activities in various fields. The chatbot can help you understand the document's content, formulate a summary, highlight key points, answer questions, provide additional information, and offer recommendations.

To manage settings for working with documents, navigate to Workspace > Documents > Documents settings. This sub-menu contains 4 sections:

  • General: In this section, you can specify general settings for working with documents. For example, you can set the directory where document files are stored for scanning and processing. The default is /data/docs. The Scan button is intended for scanning and processing documents from the specified directory. You can also perform search settings and adjust models for working with documents here.

  • Chunk Params: This section allows you to set parameters for chunking (breaking down) uploaded documents. Chunking is the process of dividing large documents into smaller parts for easier processing. Here, you can set the maximum chunk size in characters or words. Additionally, this section includes the option for PDF Extract Images (OCR), a technology that recognizes text on images. When enabled, the system will extract images from PDF files and apply OCR to recognize any text contained within these images.

  • Query Params: In this section, you can set parameters influencing queries to uploaded documents and how the chatbot responds. The "Top K" setting determines the number of best search results displayed. For example, if you set Top K = 5, the response will show 5 most relevant documents or text fragments. The "RAG Template" setting is for RAG (Retrieval Augmented Generation), a method that first extracts relevant parts of text from a document collection and then uses them to generate an answer with a language model. RAG Template sets the template for forming a query to the language model when using this method, allowing you to adapt the query format to the language model for obtaining better responses in specific usage scenarios.

  • Web Params: This section is dedicated to setting parameters related to web search and extracting information from internet resources. Here, you can adjust the SSL certificate verification check when loading web pages through HTTPS. By default, it is disabled (Off) for security reasons.

To add a new document, click the + button and select a file from your local device, then click the Save button. The uploaded document will appear in the general list:

After uploading documents, you can work with them in a chat mode. To do this, start your message in the chat line with # and select the desired document from the dropdown list. Then, an answer to your query will be formed based on the data from the selected document. This feature enables you to receive context-based answers grounded in uploaded information, which can be useful for various tasks such as searching for information, analyzing data, and making decisions based on documents:


You can use the symbol # to add web sites or YouTube videos to your query, allowing LLM to search for them as well.

Creating a Specialized Chatbot (Agent) with Integration to a Knowledge Base Based on User Documentation

Adding a New Model to OpenWebUI for Vector Database Work and RAG Creation

To work with a vector database of documents and Ollama LLM, which have better support of the languages other than English, go to Admin Panel >> Settings >> Documents and set the value of Embedding Model to sentence-transformers/all-MiniLM-L12-v2:

  1. Click the download icon next to the field and install this model.
  2. Set parameters for our RAG:
  3. Top K = 10. The system will consider the 10 most relevant documents.
  4. Chunk Size = 1024. Documents will be broken down into fragments of 1024 tokens for processing.
  5. Chunk Overlap = 100. There will be an overlap of 100 tokens between consecutive fragments.

You can try another model from this list


Any changes to the Embedding model will require you to reload documents into RAG.

Now you can go to the Workspace >> Documents section and upload our documentation. It is recommended to assign it a specific collection tag (for example, hostkey_en) to make it easier to connect to the model or during API requests.

Creating a Specialized Chatbot (Agent)

  1. Go to Workspace >> Models and click the plus icon.
  2. Set the chatbot's name and choose a base model (for example, lamma3).
  3. Specify the System Prompt, which defines the chatbot's behavior:

    You are HOSTKEY an IT Support Assistant Bot, focused on providing users with IT support based on the content from knowledge base. Stay in character and maintain your focus on IT support, avoiding unrelated activities such as creative writing or engaging in non-IT discussions. 
    If you cannot find relevant information in the knowledge base or if the user asks non-related questions that are not part of the knowledge base, do not attempt to answer and inform the user that you are unable to assist and print text "Visit for more information" at the end. 
    Provide short step-by-step instructions and external links
    Provide a link to relevant doc page about user question started with 'See more information here:'  
    Add text "Visit for more information" at the end. 
    Example of answer:
    User: How can I cancel my server?
    You can cancel your server at any time. To do this, you need to access the Invapi control panel and follow these steps:
    - Go to the "Billing" tab in the specific server management menu.
    - Click the [Cancel service] button.
    - Describe the reason for the cancellation and select its type.
    - Click the [Confirm termination] button.
    Please note that for immediate cancellation, we will not refund the hours of actual server usage, including the time to provision the server itself, order software, and process the cancellation request (up to 12 hours). The unused balance will be returned to your credit balance. Withdrawal of funds from the credit balance will be made in accordance with our refund policy.
    You can cancel the service cancellation request in the Billing tab using the [Revoke] button.
    Additionally, if you need to cancel a service that includes an OS/software license or has a customized/non-standard configuration, please contact us via a ticketing system for manual refund processing.
    See more information here:
  4. Connect the necessary document collection by clicking the Select Documents button in the Knowledge section and choosing the one you need by tag.

  5. Configure additional parameters in the Advanced Params section:

  6. Temperature = 0.3
  7. Context Length = 4089

  8. Click the Save & Update~ button to create a custom customer support chatbot model.

Tips for Working with RAG in OpenWebUI

  • Any manipulations with the Embedding Model will require deleting and uploading documents to the vector database again. Changing RAG parameters does not require this.
  • When adding or removing documents, be sure to update the custom model (if one exists) and the document collection. Otherwise, searching for them may not work correctly.
  • OpenWebUI recognizes pdf, csv, rst, xml, md, epub, doc, docx, xls, xlsx, ppt, pptx, txt formats, but it is recommended to upload documents in plain text.
  • Using hybrid search improves results but consumes many resources, and the response time may take 20–40 seconds even on a powerful GPU.

Ordering a Server with Ollama via API

To install this software using the API, follow these instructions.