Lock in your current rates now! ⭐ Price update planned for January.
EN
Currency:
EUR – €
Choose a currency
  • Euro EUR – €
  • United States dollar USD – $
VAT:
OT 0%
Choose your country (VAT)
  • OT All others 0%

12.01.2026

NVIDIA RTX PRO 2000 Blackwell: What’s this “junior” GPU in the new family of professional graphics cards capable of?

server one
HOSTKEY

By Alexander Kazantsev, Head of Documentation and Content at HOSTKEY

NVIDIA itself positions the RTX PRO 2000 Blackwell as a compact and energy-efficient solution for professional workstations that accelerates graphics and AI tasks. It promises maximum AI performance with FP4 support, DLSS 4, and optimizations for RTX Neural Shaders and Mega Geometry technologies. Additionally, it supports complex multi-application workflows such as generative AI, 8K video processing, real-time rendering, and CAD with ray tracing capabilities. In other words, this card is designed for use in content creation, 3D design, video editing, and CAD applications.

At HOSTKEY, we consider the RTX PRO 2000 Blackwell as a replacement for servers that no longer support the 1080Ti graphics cards, as well as an alternative to the more expensive 16GB A4000. But what exactly do you get for the price of €850?

Rent a server with RTX PRO 2000 Blackwell
Experience affordable GPU server performance for your workloads

Some Odd Specifications

Let’s start with the specifications. Since this card belongs to the Blackwell series, it can be compared to the RTX 6000 PRO Blackwell in terms of performance and features.

Feature

NVIDIA RTX PRO 6000 Blackwell

NVIDIA RTX PRO 2000 Blackwell

Architecture

Blackwell

Blackwell

CUDA Cores

24,064

4,352

Tensor Cores

752 (5th generation)

136 (5th generation)

RT Cores

188 (4th generation)

34 (4th generation)

Memory

96 GB GDDR7 ECC

16 GB GDDR7 ECC

Bandwidth

1,597–1,792 GB/s

288 GB/s

TDP (Thermal Design Power)

600 W

70 W

Interface

PCIe 5.0 x16

PCIe 5.0 x8

AI Performance (TOPS)

Up to 4,000

545

Form Factor

FHFL (Full Height Full Length), dual-slot, passive cooling

Compact SFF (Small Form Factor)

As you can see, the reduction in performance is not even threefold; in fact, we are dealing with a card that has been significantly downgraded, with its capabilities reduced to only about one-fifth of those of the older server model.

So, how does this card compare to the consumer segment? Given its specifications, one might assume it would be roughly on par with a 5060 Ti with 16GB of memory. However, the results were somewhat disheartening:

RTX PRO 2000 Blackwell, RTX 5060 Ti 16GB, and RTX 5060 – all these are GPUs based on the entry-level Blackwell architecture. They all utilize PCIe 5.0 x8 and GDDR7 memory.

Specification Comparison

Specification

RTX PRO 2000 Blackwell

RTX 5060 Ti 16GB

RTX 5060 8GB

CUDA Cores

4352

4608

3840

Tensor Cores

136 (5th generation)

144 (5th generation)

120 (5th generation)

RT Cores

34 (4th generation)

36 (4th generation)

30 (4th generation)

Memory

16GB GDDR7 ECC

16GB GDDR7

8GB GDDR7

Bandwidth

288GB/s

448GB/s

448GB/s

TDP

70W

180W

145W

AI TOPS

545

759

~500

FP32 Performance

16.97 TFLOPS

23.7 TFLOPS

19.2 TFLOPS

Form Factor

Compact SFF

2.5-slot

2-2.5-slot

In terms of specifications, our RTX PRO 2000 falls between the RTX 5060 Ti and RTX 5060; it outperforms the 5060 in AI tasks (although by a small margin), but lags behind in FP32 performance. However, it’s possible that the card’s power supply is restricting its performance, as its power consumption is less than half that of the 5060, and so is its bandwidth (although both use the same GDDR7 ECC memory, and the PCI-E bus is also limited in the same way as the 5060).

Finally, let’s compare it to the A4000. Starting with price: the A4000 can currently be found for around €1300. That means the RTX PRO 2000 Blackwell is 35% cheaper than the A4000. But what about performance numbers?

Feature

NVIDIA RTX A4000

NVIDIA RTX PRO 2000 Blackwell

Architecture

Ampere

Blackwell

CUDA Cores

6144

4352

Tensor Cores

192 (3rd generation)

136 (5th generation)

RT Cores

48 (2nd generation)

34 (4th generation)

Memory

16 GB GDDR6 ECC

16 GB GDDR7 ECC

Bandwidth

448 GB/s

288 GB/s

TDP (Thermal Design Power)

140 W

70 W

Interface

PCIe 4.0 x16

PCIe 5.0 x8

AI TOPS (Tensor Operations Per Second)

Not specified

545

Form Factor

Single-slot

Compact, SFF (Small Form Factor)

As can be seen, the A4000 boasts better memory bandwidth, as well as a larger total number of cores (although they are from the previous generation of Ampere architecture), yet it consumes twice as much power as the RTX PRO 2000 Blackwell. Additionally, it utilizes 16 PCI-E lanes for data transfer, but these are Gen4 rather than Gen5.

Testing is necessary

We will be testing the card by comparing its performance in the same applications recommended by NVIDIA: large language models (LLMs), image and video generation, as well as checking its performance in Blender rendering.

For testing LLM inference, we will use our script based on Ollama.

Our server configuration is as follows: AMD Ryzen 9 5900X (3.7GHz, 12 cores), 64GB of RAM, 1TB NVMe SSD, and a PSU paired with the RTX PRO 2000.

Although our test unit features a compact, short PCB, the reference design still results in a two-slot card that is approximately 6.6 inches in length and utilizes turbine cooling.

The “strange” black casing shown in the photo, which actually increases the size of our small card, allows air to be drawn from inside the enclosure and then blown outward, preventing hot air from accumulating around the processor and memory components.

A similar configuration will be used for the A4000; however, the processor we used was a Ryzen 5950X running at 16x3.4 GHz. This specific processor does not participate in our tests and therefore does not affect the comparison results.

We install Ubuntu 22.04 on both machines and use our script to install the necessary drivers and CUDA software. For the RTX PRO 2000 Blackwell, we end up with the following specifications:

As can be seen, without any load, the performance of the model is merely at “room temperature” (i.e., very low). We will conduct testing on several models, including the older DeepSeek-R1:14B, the slightly more recent gpt-oss:20B with MOE, and the brand-new multimodal mixstral3:14B (which is also faster, at 8B), as well as its image recognition capability. All models utilize Q4 quantization.

We will compile the final results into a table, including numbers for DeepSeek compared to those achieved on the RTX 6000 PRO Blackwell for comparison purposes.

GPU

Model

Tokens/sec (average)

Max Context Size

Load Time (sec, average)

Generation Time (sec, average)

Notes

NVIDIA RTX 6000 PRO Blackwell (gen5)

deepseek-r1:14b

114.02

128,000

1.74

22.71

 

NVIDIA RTX A4000 (gen4)

deepseek-r1:14b

35.81

24,000

11.72

74.37

 

NVIDIA RTX PRO 2000 Blackwell

deepseek-r1:14b

27.79

24,000

3.68

91.91

 

NVIDIA RTX A4000 (gen4)

ministral-3:8b

65.42

64,000

12.92

44.98

Visual mode

NVIDIA RTX PRO 2000 Blackwell

ministral-3:8b

48.21

68,000

3.17

63.97

Visual mode

NVIDIA RTX A4000 (gen4)

ministral-3:14b

42.28

36,000

13.99

86.12

Visual mode

NVIDIA RTX PRO 2000 Blackwell

ministral-3:14b

30.97

36,000

3.68

115.42

Visual mode

NVIDIA RTX A4000 (gen4)

gpt-oss:20b

84.06

120,000

14.89

30.85

“Mixture of Experts” approach

NVIDIA RTX PRO 2000 Blackwell

gpt-oss:20b

62.54

120,000

4.23

43.26

“Mixture of Experts” approach

In summary, the RTX PRO 2000 Blackwell performs more slowly during inference tasks compared to the A4000. The only aspect in which the RTX PRO 2000 Blackwell has an advantage is in startup time, namely the time required for initial memory loading. The speed differences between the two models are as follows (in percentage terms, representing how much slower the A4000 is compared to the RTX PRO 2000 Blackwell in terms of inference performance per second):

  • Deepseek-r1:14b: ~28%
  • Ministral-3:8b: ~27%
  • GPT-oss:20b: ~26%

As can be seen, the average speed difference is roughly the same for both traditional inference models and those that utilize the Mixture of Experts (MoE) architecture. Despite the unique characteristics of the MoE architecture, which activates only a small portion of the model’s parameters, allowing for better utilization of the A4000’s strengths—such as its high memory bandwidth (384 GB/s GDDR6 on 16 GB of memory) and large L2 cache—the RTX PRO 2000 Blackwell still manages to perform fairly well despite having less power. This is thanks to its new Tensor Cores.

However, under heavy load, the A4000 temperature rises to 83°C and consumes over 130 watts of power (out of a possible 140 watts), while the RTX PRO 2000 Blackwell remains at around 59°C with a power consumption of 65 watts (out of a possible 70 watts).

When comparing the RTX PRO 2000 Blackwell to NVIDIA’s older RTX PRO 6000 Blackwell in inference tasks, we get the following overall picture:

Image and Video Generation

Here, we will utilize ComfyUI to test image generation via Z-image Turbo, as well as video generation in the “text-to-video” and “image-to-video” modes using the Kandinsky 5 Lite model. To deploy ComfyUI on a server, you can follow these steps (for testing purposes, we set everything up from the root level without using a virtual environment):

#!/usr/bin/env bash
set -euo pipefail


COMFY_DIR="/root/comfy/ComfyUI"
LISTEN_IP="${LISTEN_IP:-0.0.0.0}"


export DEBIAN_FRONTEND=noninteractive
apt-get update -y
apt-get install -y git python3 python3-pip ca-certificates


# install/update ComfyUI
if [[ -d "${COMFY_DIR}/.git" ]]; then
  git -C "$COMFY_DIR" pull --ff-only
else
  mkdir -p "$(dirname "$COMFY_DIR")"
  git clone https://github.com/comfyanonymous/ComfyUI.git "$COMFY_DIR"
fi


# python deps (system-wide, since venv is not wanted)
cd "$COMFY_DIR"
python3 -m pip install --upgrade pip wheel
python3 -m pip install -r requirements.txt


# run
exec python3 main.py --listen "${LISTEN_IP}"

For quick installation of the Z-image Turbo model on the server (after installing ComfyUI), you can use the following script:

#!/usr/bin/env bash
set -euo pipefail


COMFY_DIR="/root/comfy/ComfyUI"
MODELS_DIR="${COMFY_DIR}/models"


# URL -> relative path inside ComfyUI/models/
declare -A FILES=(
  ["https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/text_encoders/qwen_3_4b.safetensors"]="text_encoders/qwen_3_4b.safetensors"
  ["https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/vae/ae.safetensors"]="vae/ae.safetensors"
  ["https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors"]="diffusion_models/z_image_turbo_bf16.safetensors"
  ["https://huggingface.co/tarn59/pixel_art_style_lora_z_image_turbo/resolve/main/pixel_art_style_z_image_turbo.safetensors"]="loras/pixel_art_style_z_image_turbo.safetensors"
)


# sanity checks
if [[ ! -d "$COMFY_DIR" ]]; then
  echo "ERROR: ComfyUI dir not found: $COMFY_DIR"
  exit 1
fi


mkdir -p \
  "${MODELS_DIR}/text_encoders" \
  "${MODELS_DIR}/vae" \
  "${MODELS_DIR}/diffusion_models" \
  "${MODELS_DIR}/loras"


# downloader (aria2c preferred for resume + parallel chunks)
if command -v aria2c >/dev/null 2>&1; then
  DL="aria2c -c -x 8 -s 8 -k 1M --allow-overwrite=true --file-allocation=none"
else
  apt-get update -y
  apt-get install -y curl ca-certificates
  DL="curl -L --fail --retry 5 --retry-delay 2 -C - -o"
fi


for url in "${!FILES[@]}"; do
  rel="${FILES[$url]}"
  out="${MODELS_DIR}/${rel}"
  tmp="${out}.part"


  echo "==> ${rel}"
  mkdir -p "$(dirname "$out")"


  if command -v aria2c >/dev/null 2>&1; then
    # aria2c writes directly to target
    $DL -d "$(dirname "$out")" -o "$(basename "$out")" "$url"
  else
    $DL "$tmp" "$url"
    mv -f "$tmp" "$out"
  fi
done


echo
echo "Done. Files are in:"
echo "  ${MODELS_DIR}/text_encoders/"
echo "  ${MODELS_DIR}/vae/"
echo "  ${MODELS_DIR}/diffusion_models/"
echo "  ${MODELS_DIR}/loras/"

The prompt for generating an image is as follows:

Photorealistic documentary photo inside a beaver lodge on a quiet lake at night: three beavers acting like IT engineers are assembling a 19-inch server into a small rack. Each beaver wears a bright yellow construction hard hat with a clean, sharp, perfectly readable HOSTKEY logo printed on the front (exact spelling: “HOSTKEY”, all caps), centered, high-contrast, crisp lettering, not distorted. One beaver holds the rack rails, another plugs RJ‑45 patch cables into a network switch, the third checks the front-panel status LEDs. Warm tungsten lamp light, cozy wooden interior, wet realistic fur texture with tiny water droplets, realistic wood grain, subtle steam from damp fur, lake reflections visible through a small window. A neat pile of cable ties, a small screwdriver, and a laptop showing a terminal on a wooden table. Cinematic but realistic lighting, shallow depth of field, 35mm documentary photography, f/2.0, ISO 800, crisp sharp focus on the beavers, helmets, and the server rack, high detail, natural colors, realistic reflections, no cartoon look, no CGI look.

We obtain the following results. This is a cold start (also known as the first run) in Z-image Turbo; all parameters are visible on the image.

It takes 26 seconds to generate an image of 1024x1024 pixels with 9 iterations. Subsequent generations or changes in image size will be slightly faster. For example, let’s change the aspect ratio to 1280x720.

Now, let’s generate a video using Kandinsky 5 Lite. They promised us that the power of the RTX PRO 2000 Blackwell would be sufficient even for such a task.

First, let’s try our prompt  in the “text-to-video” mode. We deploy the model using the following script:

#!/usr/bin/env bash
set -euo pipefail


COMFY_DIR="/root/comfy/ComfyUI"
MODELS_DIR="${COMFY_DIR}/models"


declare -A FILES=(
  ["https://huggingface.co/Comfy-Org/HunyuanVideo_1.5_repackaged/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors"]="text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors"
  ["https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors?download=true"]="text_encoders/clip_l.safetensors"
  ["https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hunyuan_video_vae_bf16.safetensors"]="vae/hunyuan_video_vae_bf16.safetensors"
  ["https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-sft-5s/resolve/main/model/kandinsky5lite_t2v_sft_5s.safetensors"]="diffusion_models/kandinsky5lite_t2v_sft_5s.safetensors"
)


if [[ ! -d "$COMFY_DIR" ]]; then
  echo "ERROR: ComfyUI dir not found: $COMFY_DIR"
  exit 1
fi


mkdir -p \
  "${MODELS_DIR}/text_encoders" \
  "${MODELS_DIR}/vae" \
  "${MODELS_DIR}/diffusion_models"


# Prefer aria2c for resume/large files
if command -v aria2c >/dev/null 2>&1; then
  DL_ARIA2=1
else
  apt-get update -y
  apt-get install -y aria2 ca-certificates
  DL_ARIA2=1
fi


for url in "${!FILES[@]}"; do
  rel="${FILES[$url]}"
  out="${MODELS_DIR}/${rel}"
  dir="$(dirname "$out")"
  name="$(basename "$out")"


  echo "==> ${rel}"
  mkdir -p "$dir"


  # -c resume, -x/-s connections, -k chunk size
  aria2c -c -x 8 -s 8 -k 1M --allow-overwrite=true --file-allocation=none \
    -d "$dir" -o "$name" "$url"
done


echo
echo "Done. Verify files exist:"
echo "  ${MODELS_DIR}/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors"
echo "  ${MODELS_DIR}/text_encoders/clip_l.safetensors"
echo "  ${MODELS_DIR}/vae/hunyuan_video_vae_bf16.safetensors"
echo "  ${MODELS_DIR}/diffusion_models/kandinsky5lite_t2v_sft_5s.safetensors"

We were able to heat up the graphics card to nearly 70 degrees Celsius and utilize all available processing power.

Final result and time taken.

Over 24 minutes and 5 seconds in length, the video is in a resolution of 768x512. 

Next, we’ll take the image that we generated earlier and try to bring it to life. We’ll add the missing parts necessary for it to function properly…

#!/usr/bin/env bash
set -euo pipefail


COMFY_DIR="/root/comfy/ComfyUI"
MODELS_DIR="${COMFY_DIR}/models"
URL="https://huggingface.co/kandinskylab/Kandinsky-5.0-I2V-Lite-5s/resolve/main/model/kandinsky5lite_i2v_5s.safetensors"
OUT="${MODELS_DIR}/diffusion_models/kandinsky5lite_i2v_5s.safetensors"


mkdir -p "$(dirname "$OUT")"


# aria2c preferred (resume + faster)
if command -v aria2c >/dev/null 2>&1; then
  aria2c -c -x 8 -s 8 -k 1M --allow-overwrite=true --file-allocation=none \
    -d "$(dirname "$OUT")" -o "$(basename "$OUT")" "$URL"
else
  apt-get update -y
  apt-get install -y curl ca-certificates
  curl -L --fail --retry 5 --retry-delay 2 -C - -o "$OUT" "$URL"
fi


echo "Installed: $OUT"

Then, we bring our image to life by using the following prompt:

Photorealistic image-to-video animation of the same scene: three wet,
ealistic beavers inside a wooden beaver lodge on a calm lake at
night, wearing bright yellow construction hard hats with a clean,
readable "HOSTKEY" logo. Keep the composition and identities
consistent with the input image: same beavers, same server rack and
cables, same laptop on the table, same
same warm tungsten lamp lighting and
lake view through the windows. Subtle natural motion: the beavers
blink and breathe, whiskers and wet fur gently move, slight head turns
and small paw movements as they plug and adjust Ethernet cables,
server LEDs softly flicker, tiny cable sway, faint steam drifting
upward. Gentle handheld camera micro-movement with a very slow push-in
(dolly-in), shallow depth of field, cinematic realism, high detail,
realistic reflections on the helmets, no scene changes.

It’s still those same 24 minutes. On one hand, the results aren’t particularly impressive; on the other hand, the graphics card performs in this mode, albeit not at super-high speeds. For models that fit within 16 GB of memory, the card is quite suitable. Considering that on my A4000, I was unable to run the same video rendering due to certain architectural limitations, this card performs quite well in that regard.

Rendering in Blender

For testing, we will be using scripts from https://opendata.blender.org/. We install the latest version via Snap (we have already installed the new version 5), download the Linux benchmark from the website, extract it, and then run it.

In the benchmark, the card also reached its full performance in terms of power consumption, but the temperature of the card remained low, as did its memory usage.

In summary, we obtained three results across three different scenarios:

  • Monster: 1782.994259 samples per minute
  • Junkshop: 1010.288134 samples per minute
  • Classroom: 1008.595210 samples per minute

By adding these numbers together, we get an average score of 3801.877603 samples per minute, and this figure can now be compared with other results.

These numbers are just short of the performance level of the NVIDIA RTX A4500, but they are lower than those of the RTX 4060 Ti.

Again, we conducted our tests using Blender 5; however, these numbers refer to Blender 4.5.0. If we look at the tables, the results of other users are even lower.

That is to say, our card performed much better in tests than others’. It’s possible that the new Blender software, as well as optimizations to the drivers and CUDA, played a role in this improved performance.

To sum things up

We have a compact, low-power card based on NVIDIA’s latest architecture. While it performs slightly less than its predecessor, the A4000, in terms of inference tasks, it costs half as much and consumes half as much energy. Therefore, it can be recommended for use in neural network tasks with light loads and for models with a size of up to 14B/20B (depending on the architecture). This is especially useful in scenarios where models need to be frequently updated; for example, the RTX PRO 2000 Blackwell can be switched in within 3-4 seconds.

Personally, I liked this card; it installed fairly smoothly, and the performance losses are more than made up for by its low power consumption, lower temperatures, and compact size.

Rent a server with RTX PRO 2000 Blackwell
Experience affordable GPU server performance for your workloads

Other articles

29.12.2025

When Hybrid Architecture Outperforms Cloud and Dedicated Servers

Is your service experiencing performance drops during peak periods? Hybrid architecture helps stabilize loads and avoid unnecessary expenses. Find out when this approach works best.

23.12.2025

When the Cloud Becomes Expensive: Why and When to Switch to a Dedicated Server

Learn when it’s time to switch to a dedicated server and what benefits you’ll reap from doing so. We provide data-driven insights and an overview of the technical aspects involved in this transition.

04.12.2025

Improving LLM Benchmark for GPU Servers Equipped with NVIDIA Cards: A Focus on the Ollama Framework

How did we create our LLM benchmark for GPU servers using Ollama? We developed a script, tested it with DeepSeek R1, and configured the necessary contexts. We identified some patterns and compared the performance of different GPUs, all of which are now available on GitHub.

04.12.2025

What is the Cloud and How It Outperforms Traditional Hosting: A Comprehensive Overview of Cloud Computing

The cloud is a flexible and cost-effective solution that adapts to demand, enabling businesses to grow without unnecessary expenses. Through a simple yet realistic example, we demonstrate how cloud services work internally and why they often outperform traditional hosting in certain situations.

01.12.2025

Debian 13 “Trixie” and Proxmox VE 9.0: Implementation and Testing in Production

The new version of Debian 13 and the release of Proxmox VE 9.0 came out almost simultaneously, generating significant interest from customers. In this article, we detail how the HOSTKEY team adapted their processes, automated deployments, and prepared their infrastructure for these new releases.

Upload