Skip to content

Software Questions

In this article

My kernel and drivers are not updating/installing on Ubuntu

The problem where a new kernel or drivers (kernel modules) do not install can occur due to the /boot partition being full during simultaneous system kernel updates, preventing the build of new initial RAM disks (initrd). To check this, run the command:

sudo apt --fix-broken install

If you see errors in the output, check the fill level of the /boot partition. For this, look at the output of the command df -h /boot

/dev/sda2       739M  287M  398M  42% /boot   

For successful rebuilds of initrd, the number before the percentage fill of the /boot partition should be more than 200M. If there is no free space, then perform the following steps:

  1. Create a backup of the partition so you can quickly restore files if you accidentally delete necessary ones:

    sudo rsync -av /boot/ /boot.old/
    
  2. Look at the contents of the /boot partition and find all initrd images:

    ls /boot | grep 'initrd.img-'
    

    You should get output similar to this:

    initrd.img
    initrd.img-6.8.0-57-generic
    initrd.img-6.8.0-58-generic
    initrd.img-6.8.0-59-generic
    initrd.img-6.8.0-60-generic
    initrd.img-initrd.img
    initrd.img-initrd.img.old
    initrd.img.old  
    
  3. Delete extra initrd images, LEAVING THE LAST TWO. In our case we need to delete initrd.img-6.8.0-57-generic and initrd.img-6.8.0-58-generic.

    Attention

    The following commands may lead to a malfunction of your operating system, so pay attention to the versions of deleted files. There must be files for the last and second-to-last kernel versions in the /boot partition! You can check which kernel you are currently using with the command uname -a. If something goes wrong, you can restore the contents of the /boot partition from the backup made on step one with the command sudo rsync -av /boot.old/ /boot/.

    Do this with the command:

    rm -f /boot/initrd.img-6.8.0-57-generic
    

    Repeat it for each file.

    Do the same with the files vmlinuz and System.map (optional):

    rm -f /boot/vmlinuz-6.8.0-57-generic 
    rm -f /boot/System.map-6.8.0-57-generic
    
  4. Clean the system of packages related to old kernels and run post-installation and building of drivers and kernel modules with commands:

    sudo apt autoremove
    sudo apt --fix-broken install
    
  5. Reboot OS:

    reboot
    

I get an error with Docker Compose

If you receive an error like docker: 'compose' is not a docker command or docker-compose: command not found when running docker compose, it may mean that your operating system version is old where Docker Compose was not installed as a plugin or added to PATH. To solve this problem, follow these steps:

  1. Install Docker Compose (if not installed):

    mkdir -p ~/.docker/cli-plugins/
    curl -SL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64 -o ~/.docker/cli-plugins/docker-compose
    chmod +x ~/.docker/cli-plugins/docker-compose
    

    Replace latest with the current version from the official repository if necessary.

  2. Check the installation:

    docker-compose --version
    

    If the command executes successfully, Docker Compose is installed.

  3. If the command is still not found, make sure that ~/.docker/cli-plugins/ is added to the environment variable PATH. Add this to ~/.bashrc or ~/.zshrc:

    export PATH=$PATH:~/.docker/cli-plugins/
    

    Then execute:

    source ~/.bashrc  # or source ~/.zshrc
    
  4. Check the installation again:

    docker-compose --version
    

Multilingual Neural Models like DeepSeek R1 Respond in Chinese instead English

Most multilingual models, such as DeepSeek, may occasionally switch to the primary training language (Chinese, for example) even if the request was made in English. This occurs due to model distillation, compression, or the presence of responses primarily in one main language.

To minimize this behavior, it is recommended to explicitly specify the response language by adding "Respond only in English" at the end of the prompt query and including this line in the system prompt. It is also advisable to use models like Qwen3 or Gemma3, which demonstrate greater stability in versions with fewer parameters compared to DeepSeek.

Additionally, you can manually check for responses in English using tools like OpenWebUI or on your chat's backend if you are working through an API.

Neural Model in OpenWebUI or Ollama Takes a Long Time to Respond

If the model takes a long time to respond, it may be due to its size and the capacity of your server.

Firstly, ensure that your model fits entirely into the video memory of the GPU. For example, the model llama4:16x17b is 67 GB when compressed (q4) and requires 80–90 GB of video memory when fully unpacked. If your GPU is an NVIDIA A5000 or RTX 4090 with 24 GB of video memory, Ollama will offload parts of the model layers to the server's CPU, causing VM overload, core allocation reduction, and long response delays.

To work with such a model, more powerful GPUs are needed, such as Nvidia H100 with 80 GB of video memory or a combination of four RTX 4090s. RAM is important only for RAG tasks (working with knowledge bases and loaded files) and typically requires at least 32 GB.

You can estimate the size of the model in video memory by multiplying its size by 2 if the model is compressed to q4, or by 1.5 if the model is compressed as q8. For every additional 1000 tokens in a context window over 8000, add 1 GB of required video memory.

To check your GPU's load, log into the server via SSH and run ollama ps in the command line:

[ root ]$ ollama ps 
NAME                                  ID              SIZE      PROCESSOR    UNTIL
yxchia/multilingual-e5-base:latest    f5248cae7e12    1.1 GB    100% GPU     14 minutes from now
qwen3:14b                             bdbd181c33f2    14 GB     100% GPU     14 minutes from now

The output will show how much space your model occupies and whether it fits entirely in the GPU.

!!! "Note" For GPUs with 24 GB of video memory, models larger than 14B or compressed beyond q8 are not recommended. The larger the model's parameter count (volume) and context window size, the longer the response process will be.

Information

Computational performance for 14B models on Nvidia A5000:

  • Cold start takes about 30-40 seconds before a response.
  • Response time is 10–15 seconds (without reasoning).
  • Response time is 20-30 seconds (with reasoning).

If RAG (Retrieval-Augmented Generation) or MCP is used, the response time increases by 5–10 seconds (for database search and tool requests).

Token generation speed is ~40–45 tokens per second. You can verify this by clicking on the icon at the bottom of the chat response line in OpenWebUI and checking the response_token/s parameter.


Some of the content on this page was created or translated using AI.

question_mark
Is there anything I can help you with?
question_mark
AI Assistant ×