12.01.2026

NVIDIA RTX PRO 2000 Blackwell: What’s this “junior” GPU in the new family of professional graphics cards capable of?

By Alexander Kazantsev, Head of Documentation and Content at HOSTKEY

NVIDIA itself positions the RTX PRO 2000 Blackwell as a compact and energy-efficient solution for professional workstations that accelerates graphics and AI tasks. It promises maximum AI performance with FP4 support, DLSS 4, and optimizations for RTX Neural Shaders and Mega Geometry technologies. Additionally, it supports complex multi-application workflows such as generative AI, 8K video processing, real-time rendering, and CAD with ray tracing capabilities. In other words, this card is designed for use in content creation, 3D design, video editing, and CAD applications.

At HOSTKEY, we consider the RTX PRO 2000 Blackwell as a replacement for servers that no longer support the 1080Ti graphics cards, as well as an alternative to the more expensive 16GB A4000. But what exactly do you get for the price of €850?

Rent a server with RTX PRO 2000 Blackwell

Experience affordable GPU server performance for your workloads

Select

Some Odd Specifications

Let’s start with the specifications. Since this card belongs to the Blackwell series, it can be compared to the RTX 6000 PRO Blackwell in terms of performance and features.

Feature	NVIDIA RTX PRO 6000 Blackwell	NVIDIA RTX PRO 2000 Blackwell
Architecture	Blackwell	Blackwell
CUDA Cores	24,064	4,352
Tensor Cores	752 (5th generation)	136 (5th generation)
RT Cores	188 (4th generation)	34 (4th generation)
Memory	96 GB GDDR7 ECC	16 GB GDDR7 ECC
Bandwidth	1,597–1,792 GB/s	288 GB/s
TDP (Thermal Design Power)	600 W	70 W
Interface	PCIe 5.0 x16	PCIe 5.0 x8
AI Performance (TOPS)	Up to 4,000	545
Form Factor	FHFL (Full Height Full Length), dual-slot, passive cooling	Compact SFF (Small Form Factor)

As you can see, the reduction in performance is not even threefold; in fact, we are dealing with a card that has been significantly downgraded, with its capabilities reduced to only about one-fifth of those of the older server model.

So, how does this card compare to the consumer segment? Given its specifications, one might assume it would be roughly on par with a 5060 Ti with 16GB of memory. However, the results were somewhat disheartening:

RTX PRO 2000 Blackwell, RTX 5060 Ti 16GB, and RTX 5060 – all these are GPUs based on the entry-level Blackwell architecture. They all utilize PCIe 5.0 x8 and GDDR7 memory.

Specification Comparison

Specification	RTX PRO 2000 Blackwell	RTX 5060 Ti 16GB	RTX 5060 8GB
CUDA Cores	4352	4608	3840
Tensor Cores	136 (5th generation)	144 (5th generation)	120 (5th generation)
RT Cores	34 (4th generation)	36 (4th generation)	30 (4th generation)
Memory	16GB GDDR7 ECC	16GB GDDR7	8GB GDDR7
Bandwidth	288GB/s	448GB/s	448GB/s
TDP	70W	180W	145W
AI TOPS	545	759	~500
FP32 Performance	16.97 TFLOPS	23.7 TFLOPS	19.2 TFLOPS
Form Factor	Compact SFF	2.5-slot	2-2.5-slot

In terms of specifications, our RTX PRO 2000 falls between the RTX 5060 Ti and RTX 5060; it outperforms the 5060 in AI tasks (although by a small margin), but lags behind in FP32 performance. However, it’s possible that the card’s power supply is restricting its performance, as its power consumption is less than half that of the 5060, and so is its bandwidth (although both use the same GDDR7 ECC memory, and the PCI-E bus is also limited in the same way as the 5060).

Finally, let’s compare it to the A4000. Starting with price: the A4000 can currently be found for around €1300. That means the RTX PRO 2000 Blackwell is 35% cheaper than the A4000. But what about performance numbers?

Feature	NVIDIA RTX A4000	NVIDIA RTX PRO 2000 Blackwell
Architecture	Ampere	Blackwell
CUDA Cores	6144	4352
Tensor Cores	192 (3rd generation)	136 (5th generation)
RT Cores	48 (2nd generation)	34 (4th generation)
Memory	16 GB GDDR6 ECC	16 GB GDDR7 ECC
Bandwidth	448 GB/s	288 GB/s
TDP (Thermal Design Power)	140 W	70 W
Interface	PCIe 4.0 x16	PCIe 5.0 x8
AI TOPS (Tensor Operations Per Second)	Not specified	545
Form Factor	Single-slot	Compact, SFF (Small Form Factor)

As can be seen, the A4000 boasts better memory bandwidth, as well as a larger total number of cores (although they are from the previous generation of Ampere architecture), yet it consumes twice as much power as the RTX PRO 2000 Blackwell. Additionally, it utilizes 16 PCI-E lanes for data transfer, but these are Gen4 rather than Gen5.

Testing is necessary

We will be testing the card by comparing its performance in the same applications recommended by NVIDIA: large language models (LLMs), image and video generation, as well as checking its performance in Blender rendering.

For testing LLM inference, we will use our script based on Ollama.

Our server configuration is as follows: AMD Ryzen 9 5900X (3.7GHz, 12 cores), 64GB of RAM, 1TB NVMe SSD, and a PSU paired with the RTX PRO 2000.

Although our test unit features a compact, short PCB, the reference design still results in a two-slot card that is approximately 6.6 inches in length and utilizes turbine cooling.

The “strange” black casing shown in the photo, which actually increases the size of our small card, allows air to be drawn from inside the enclosure and then blown outward, preventing hot air from accumulating around the processor and memory components.

A similar configuration will be used for the A4000; however, the processor we used was a Ryzen 5950X running at 16x3.4 GHz. This specific processor does not participate in our tests and therefore does not affect the comparison results.

We install Ubuntu 22.04 on both machines and use our script to install the necessary drivers and CUDA software. For the RTX PRO 2000 Blackwell, we end up with the following specifications:

As can be seen, without any load, the performance of the model is merely at “room temperature” (i.e., very low). We will conduct testing on several models, including the older DeepSeek-R1:14B, the slightly more recent gpt-oss:20B with MOE, and the brand-new multimodal mixstral3:14B (which is also faster, at 8B), as well as its image recognition capability. All models utilize Q4 quantization.

We will compile the final results into a table, including numbers for DeepSeek compared to those achieved on the RTX 6000 PRO Blackwell for comparison purposes.

GPU	Model	Tokens/sec (average)	Max Context Size	Load Time (sec, average)	Generation Time (sec, average)	Notes
NVIDIA RTX 6000 PRO Blackwell (gen5)	deepseek-r1:14b	114.02	128,000	1.74	22.71
NVIDIA RTX A4000 (gen4)	deepseek-r1:14b	35.81	24,000	11.72	74.37
NVIDIA RTX PRO 2000 Blackwell	deepseek-r1:14b	27.79	24,000	3.68	91.91
NVIDIA RTX A4000 (gen4)	ministral-3:8b	65.42	64,000	12.92	44.98	Visual mode
NVIDIA RTX PRO 2000 Blackwell	ministral-3:8b	48.21	68,000	3.17	63.97	Visual mode
NVIDIA RTX A4000 (gen4)	ministral-3:14b	42.28	36,000	13.99	86.12	Visual mode
NVIDIA RTX PRO 2000 Blackwell	ministral-3:14b	30.97	36,000	3.68	115.42	Visual mode
NVIDIA RTX A4000 (gen4)	gpt-oss:20b	84.06	120,000	14.89	30.85	“Mixture of Experts” approach
NVIDIA RTX PRO 2000 Blackwell	gpt-oss:20b	62.54	120,000	4.23	43.26	“Mixture of Experts” approach

In summary, the RTX PRO 2000 Blackwell performs more slowly during inference tasks compared to the A4000. The only aspect in which the RTX PRO 2000 Blackwell has an advantage is in startup time, namely the time required for initial memory loading. The speed differences between the two models are as follows (in percentage terms, representing how much slower the A4000 is compared to the RTX PRO 2000 Blackwell in terms of inference performance per second):

Deepseek-r1:14b: ~28%
Ministral-3:8b: ~27%
GPT-oss:20b: ~26%

As can be seen, the average speed difference is roughly the same for both traditional inference models and those that utilize the Mixture of Experts (MoE) architecture. Despite the unique characteristics of the MoE architecture, which activates only a small portion of the model’s parameters, allowing for better utilization of the A4000’s strengths—such as its high memory bandwidth (384 GB/s GDDR6 on 16 GB of memory) and large L2 cache—the RTX PRO 2000 Blackwell still manages to perform fairly well despite having less power. This is thanks to its new Tensor Cores.

However, under heavy load, the A4000 temperature rises to 83°C and consumes over 130 watts of power (out of a possible 140 watts), while the RTX PRO 2000 Blackwell remains at around 59°C with a power consumption of 65 watts (out of a possible 70 watts).

When comparing the RTX PRO 2000 Blackwell to NVIDIA’s older RTX PRO 6000 Blackwell in inference tasks, we get the following overall picture:

Image and Video Generation

Here, we will utilize ComfyUI to test image generation via Z-image Turbo, as well as video generation in the “text-to-video” and “image-to-video” modes using the Kandinsky 5 Lite model. To deploy ComfyUI on a server, you can follow these steps (for testing purposes, we set everything up from the root level without using a virtual environment):

#!/usr/bin/env bash
set -euo pipefail


COMFY_DIR="/root/comfy/ComfyUI"
LISTEN_IP="${LISTEN_IP:-0.0.0.0}"


export DEBIAN_FRONTEND=noninteractive
apt-get update -y
apt-get install -y git python3 python3-pip ca-certificates


# install/update ComfyUI
if [[ -d "${COMFY_DIR}/.git" ]]; then
  git -C "$COMFY_DIR" pull --ff-only
else
  mkdir -p "$(dirname "$COMFY_DIR")"
  git clone https://github.com/comfyanonymous/ComfyUI.git "$COMFY_DIR"
fi


# python deps (system-wide, since venv is not wanted)
cd "$COMFY_DIR"
python3 -m pip install --upgrade pip wheel
python3 -m pip install -r requirements.txt


# run
exec python3 main.py --listen "${LISTEN_IP}"

For quick installation of the Z-image Turbo model on the server (after installing ComfyUI), you can use the following script:

#!/usr/bin/env bash
set -euo pipefail


COMFY_DIR="/root/comfy/ComfyUI"
MODELS_DIR="${COMFY_DIR}/models"


# URL -> relative path inside ComfyUI/models/
declare -A FILES=(
  ["https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/text_encoders/qwen_3_4b.safetensors"]="text_encoders/qwen_3_4b.safetensors"
  ["https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/vae/ae.safetensors"]="vae/ae.safetensors"
  ["https://huggingface.co/Comfy-Org/z_image_turbo/resolve/main/split_files/diffusion_models/z_image_turbo_bf16.safetensors"]="diffusion_models/z_image_turbo_bf16.safetensors"
  ["https://huggingface.co/tarn59/pixel_art_style_lora_z_image_turbo/resolve/main/pixel_art_style_z_image_turbo.safetensors"]="loras/pixel_art_style_z_image_turbo.safetensors"
)


# sanity checks
if [[ ! -d "$COMFY_DIR" ]]; then
  echo "ERROR: ComfyUI dir not found: $COMFY_DIR"
  exit 1
fi


mkdir -p \
  "${MODELS_DIR}/text_encoders" \
  "${MODELS_DIR}/vae" \
  "${MODELS_DIR}/diffusion_models" \
  "${MODELS_DIR}/loras"


# downloader (aria2c preferred for resume + parallel chunks)
if command -v aria2c >/dev/null 2>&1; then
  DL="aria2c -c -x 8 -s 8 -k 1M --allow-overwrite=true --file-allocation=none"
else
  apt-get update -y
  apt-get install -y curl ca-certificates
  DL="curl -L --fail --retry 5 --retry-delay 2 -C - -o"
fi


for url in "${!FILES[@]}"; do
  rel="${FILES[$url]}"
  out="${MODELS_DIR}/${rel}"
  tmp="${out}.part"


  echo "==> ${rel}"
  mkdir -p "$(dirname "$out")"


  if command -v aria2c >/dev/null 2>&1; then
    # aria2c writes directly to target
    $DL -d "$(dirname "$out")" -o "$(basename "$out")" "$url"
  else
    $DL "$tmp" "$url"
    mv -f "$tmp" "$out"
  fi
done


echo
echo "Done. Files are in:"
echo "  ${MODELS_DIR}/text_encoders/"
echo "  ${MODELS_DIR}/vae/"
echo "  ${MODELS_DIR}/diffusion_models/"
echo "  ${MODELS_DIR}/loras/"

The prompt for generating an image is as follows:

Photorealistic documentary photo inside a beaver lodge on a quiet lake at night: three beavers acting like IT engineers are assembling a 19-inch server into a small rack. Each beaver wears a bright yellow construction hard hat with a clean, sharp, perfectly readable HOSTKEY logo printed on the front (exact spelling: “HOSTKEY”, all caps), centered, high-contrast, crisp lettering, not distorted. One beaver holds the rack rails, another plugs RJ‑45 patch cables into a network switch, the third checks the front-panel status LEDs. Warm tungsten lamp light, cozy wooden interior, wet realistic fur texture with tiny water droplets, realistic wood grain, subtle steam from damp fur, lake reflections visible through a small window. A neat pile of cable ties, a small screwdriver, and a laptop showing a terminal on a wooden table. Cinematic but realistic lighting, shallow depth of field, 35mm documentary photography, f/2.0, ISO 800, crisp sharp focus on the beavers, helmets, and the server rack, high detail, natural colors, realistic reflections, no cartoon look, no CGI look.

We obtain the following results. This is a cold start (also known as the first run) in Z-image Turbo; all parameters are visible on the image.

It takes 26 seconds to generate an image of 1024x1024 pixels with 9 iterations. Subsequent generations or changes in image size will be slightly faster. For example, let’s change the aspect ratio to 1280x720.

Now, let’s generate a video using Kandinsky 5 Lite. They promised us that the power of the RTX PRO 2000 Blackwell would be sufficient even for such a task.

First, let’s try our prompt in the “text-to-video” mode. We deploy the model using the following script:

#!/usr/bin/env bash
set -euo pipefail


COMFY_DIR="/root/comfy/ComfyUI"
MODELS_DIR="${COMFY_DIR}/models"


declare -A FILES=(
  ["https://huggingface.co/Comfy-Org/HunyuanVideo_1.5_repackaged/resolve/main/split_files/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors"]="text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors"
  ["https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors?download=true"]="text_encoders/clip_l.safetensors"
  ["https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hunyuan_video_vae_bf16.safetensors"]="vae/hunyuan_video_vae_bf16.safetensors"
  ["https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-sft-5s/resolve/main/model/kandinsky5lite_t2v_sft_5s.safetensors"]="diffusion_models/kandinsky5lite_t2v_sft_5s.safetensors"
)


if [[ ! -d "$COMFY_DIR" ]]; then
  echo "ERROR: ComfyUI dir not found: $COMFY_DIR"
  exit 1
fi


mkdir -p \
  "${MODELS_DIR}/text_encoders" \
  "${MODELS_DIR}/vae" \
  "${MODELS_DIR}/diffusion_models"


# Prefer aria2c for resume/large files
if command -v aria2c >/dev/null 2>&1; then
  DL_ARIA2=1
else
  apt-get update -y
  apt-get install -y aria2 ca-certificates
  DL_ARIA2=1
fi


for url in "${!FILES[@]}"; do
  rel="${FILES[$url]}"
  out="${MODELS_DIR}/${rel}"
  dir="$(dirname "$out")"
  name="$(basename "$out")"


  echo "==> ${rel}"
  mkdir -p "$dir"


  # -c resume, -x/-s connections, -k chunk size
  aria2c -c -x 8 -s 8 -k 1M --allow-overwrite=true --file-allocation=none \
    -d "$dir" -o "$name" "$url"
done


echo
echo "Done. Verify files exist:"
echo "  ${MODELS_DIR}/text_encoders/qwen_2.5_vl_7b_fp8_scaled.safetensors"
echo "  ${MODELS_DIR}/text_encoders/clip_l.safetensors"
echo "  ${MODELS_DIR}/vae/hunyuan_video_vae_bf16.safetensors"
echo "  ${MODELS_DIR}/diffusion_models/kandinsky5lite_t2v_sft_5s.safetensors"

We were able to heat up the graphics card to nearly 70 degrees Celsius and utilize all available processing power.

Final result and time taken.

Over 24 minutes and 5 seconds in length, the video is in a resolution of 768x512.

Next, we’ll take the image that we generated earlier and try to bring it to life. We’ll add the missing parts necessary for it to function properly…

#!/usr/bin/env bash
set -euo pipefail


COMFY_DIR="/root/comfy/ComfyUI"
MODELS_DIR="${COMFY_DIR}/models"
URL="https://huggingface.co/kandinskylab/Kandinsky-5.0-I2V-Lite-5s/resolve/main/model/kandinsky5lite_i2v_5s.safetensors"
OUT="${MODELS_DIR}/diffusion_models/kandinsky5lite_i2v_5s.safetensors"


mkdir -p "$(dirname "$OUT")"


# aria2c preferred (resume + faster)
if command -v aria2c >/dev/null 2>&1; then
  aria2c -c -x 8 -s 8 -k 1M --allow-overwrite=true --file-allocation=none \
    -d "$(dirname "$OUT")" -o "$(basename "$OUT")" "$URL"
else
  apt-get update -y
  apt-get install -y curl ca-certificates
  curl -L --fail --retry 5 --retry-delay 2 -C - -o "$OUT" "$URL"
fi


echo "Installed: $OUT"

Then, we bring our image to life by using the following prompt:

Photorealistic image-to-video animation of the same scene: three wet,
ealistic beavers inside a wooden beaver lodge on a calm lake at
night, wearing bright yellow construction hard hats with a clean,
readable "HOSTKEY" logo. Keep the composition and identities
consistent with the input image: same beavers, same server rack and
cables, same laptop on the table, same
same warm tungsten lamp lighting and
lake view through the windows. Subtle natural motion: the beavers
blink and breathe, whiskers and wet fur gently move, slight head turns
and small paw movements as they plug and adjust Ethernet cables,
server LEDs softly flicker, tiny cable sway, faint steam drifting
upward. Gentle handheld camera micro-movement with a very slow push-in
(dolly-in), shallow depth of field, cinematic realism, high detail,
realistic reflections on the helmets, no scene changes.

It’s still those same 24 minutes. On one hand, the results aren’t particularly impressive; on the other hand, the graphics card performs in this mode, albeit not at super-high speeds. For models that fit within 16 GB of memory, the card is quite suitable. Considering that on my A4000, I was unable to run the same video rendering due to certain architectural limitations, this card performs quite well in that regard.

Rendering in Blender

For testing, we will be using scripts from https://opendata.blender.org/. We install the latest version via Snap (we have already installed the new version 5), download the Linux benchmark from the website, extract it, and then run it.

In the benchmark, the card also reached its full performance in terms of power consumption, but the temperature of the card remained low, as did its memory usage.

In summary, we obtained three results across three different scenarios:

Monster: 1782.994259 samples per minute
Junkshop: 1010.288134 samples per minute
Classroom: 1008.595210 samples per minute

By adding these numbers together, we get an average score of 3801.877603 samples per minute, and this figure can now be compared with other results.

These numbers are just short of the performance level of the NVIDIA RTX A4500, but they are lower than those of the RTX 4060 Ti.

Again, we conducted our tests using Blender 5; however, these numbers refer to Blender 4.5.0. If we look at the tables, the results of other users are even lower.

That is to say, our card performed much better in tests than others’. It’s possible that the new Blender software, as well as optimizations to the drivers and CUDA, played a role in this improved performance.

To sum things up

We have a compact, low-power card based on NVIDIA’s latest architecture. While it performs slightly less than its predecessor, the A4000, in terms of inference tasks, it costs half as much and consumes half as much energy. Therefore, it can be recommended for use in neural network tasks with light loads and for models with a size of up to 14B/20B (depending on the architecture). This is especially useful in scenarios where models need to be frequently updated; for example, the RTX PRO 2000 Blackwell can be switched in within 3-4 seconds.

Personally, I liked this card; it installed fairly smoothly, and the performance losses are more than made up for by its low power consumption, lower temperatures, and compact size.