Software Questions¶
In this article
My kernel and drivers are not updating/installing on Ubuntu¶
The problem where a new kernel or drivers (kernel modules) do not install can occur due to the /boot
partition being full during simultaneous system kernel updates, preventing the build of new initial RAM disks (initrd). To check this, run the command:
If you see errors in the output, check the fill level of the /boot
partition. For this, look at the output of the command df -h /boot
For successful rebuilds of initrd, the number before the percentage fill of the /boot
partition should be more than 200M
. If there is no free space, then perform the following steps:
-
Create a backup of the partition so you can quickly restore files if you accidentally delete necessary ones:
-
Look at the contents of the
/boot
partition and find all initrd images:You should get output similar to this:
-
Delete extra initrd images, LEAVING THE LAST TWO. In our case we need to delete
initrd.img-6.8.0-57-generic
andinitrd.img-6.8.0-58-generic
.Attention
The following commands may lead to a malfunction of your operating system, so pay attention to the versions of deleted files. There must be files for the last and second-to-last kernel versions in the
/boot
partition! You can check which kernel you are currently using with the commanduname -a
. If something goes wrong, you can restore the contents of the /boot partition from the backup made on step one with the commandsudo rsync -av /boot.old/ /boot/
.Do this with the command:
Repeat it for each file.
Do the same with the files
vmlinuz
andSystem.map
(optional): -
Clean the system of packages related to old kernels and run post-installation and building of drivers and kernel modules with commands:
-
Reboot OS:
I get an error with Docker Compose¶
If you receive an error like docker: 'compose' is not a docker command
or docker-compose: command not found
when running docker compose, it may mean that your operating system version is old where Docker Compose was not installed as a plugin or added to PATH
. To solve this problem, follow these steps:
-
Install Docker Compose (if not installed):
mkdir -p ~/.docker/cli-plugins/ curl -SL https://github.com/docker/compose/releases/latest/download/docker-compose-linux-x86_64 -o ~/.docker/cli-plugins/docker-compose chmod +x ~/.docker/cli-plugins/docker-compose
Replace
latest
with the current version from the official repository if necessary. -
Check the installation:
If the command executes successfully, Docker Compose is installed.
-
If the command is still not found, make sure that
~/.docker/cli-plugins/
is added to the environment variablePATH
. Add this to~/.bashrc
or~/.zshrc
:Then execute:
-
Check the installation again:
Multilingual Neural Models like DeepSeek R1 Respond in Chinese instead English¶
Most multilingual models, such as DeepSeek, may occasionally switch to the primary training language (Chinese, for example) even if the request was made in English. This occurs due to model distillation, compression, or the presence of responses primarily in one main language.
To minimize this behavior, it is recommended to explicitly specify the response language by adding "Respond only in English" at the end of the prompt query and including this line in the system prompt. It is also advisable to use models like Qwen3 or Gemma3, which demonstrate greater stability in versions with fewer parameters compared to DeepSeek.
Additionally, you can manually check for responses in English using tools like OpenWebUI or on your chat's backend if you are working through an API.
Neural Model in OpenWebUI or Ollama Takes a Long Time to Respond¶
If the model takes a long time to respond, it may be due to its size and the capacity of your server.
Firstly, ensure that your model fits entirely into the video memory of the GPU. For example, the model llama4:16x17b is 67 GB when compressed (q4) and requires 80–90 GB of video memory when fully unpacked. If your GPU is an NVIDIA A5000 or RTX 4090 with 24 GB of video memory, Ollama will offload parts of the model layers to the server's CPU, causing VM overload, core allocation reduction, and long response delays.
To work with such a model, more powerful GPUs are needed, such as Nvidia H100 with 80 GB of video memory or a combination of four RTX 4090s. RAM is important only for RAG tasks (working with knowledge bases and loaded files) and typically requires at least 32 GB.
You can estimate the size of the model in video memory by multiplying its size by 2 if the model is compressed to q4, or by 1.5 if the model is compressed as q8. For every additional 1000 tokens in a context window over 8000, add 1 GB of required video memory.
To check your GPU's load, log into the server via SSH and run ollama ps
in the command line:
[ root ]$ ollama ps
NAME ID SIZE PROCESSOR UNTIL
yxchia/multilingual-e5-base:latest f5248cae7e12 1.1 GB 100% GPU 14 minutes from now
qwen3:14b bdbd181c33f2 14 GB 100% GPU 14 minutes from now
The output will show how much space your model occupies and whether it fits entirely in the GPU.
!!! "Note" For GPUs with 24 GB of video memory, models larger than 14B or compressed beyond q8 are not recommended. The larger the model's parameter count (volume) and context window size, the longer the response process will be.
Information
Computational performance for 14B models on Nvidia A5000:
- Cold start takes about 30-40 seconds before a response.
- Response time is 10–15 seconds (without reasoning).
- Response time is 20-30 seconds (with reasoning).
If RAG (Retrieval-Augmented Generation) or MCP is used, the response time increases by 5–10 seconds (for database search and tool requests).
Token generation speed is ~40–45 tokens per second. You can verify this by clicking on the icon at the bottom of the chat response line in OpenWebUI and checking the response_token/s
parameter.
Some of the content on this page was created or translated using AI.