AMD Ryzen Server 7950X — €129/month or €0.179/hour ⭐ 16 cores, 4.5 GHz / 128 GB RAM / 4 TB NVMe
EN
Currency:
EUR – €
Choose a currency
  • Euro EUR – €
  • United States dollar USD – $
VAT:
OT 0%
Choose your country (VAT)
  • OT All others 0%

11.02.2026

Benchmarking the Radeon AI Pro R9700: AMD’s “Red Team” Buys into On-Device AI

server one
HOSTKEY

Author: Alexander Kazantsev, Head of Documentation and Content at HOSTKEY

Testing NVIDIA graphics cards one after another can get tedious, especially since the differences between recent generations mainly lie in the power of the Blackwell series processors, memory capacity, and bus width. However, it’s more interesting to see what competitors have to offer, especially when they boldly label these features as “AI.”

Today, we’ll put the AMD Radeon AI Pro R9700 to the test with 32 gigabytes of video memory. This is AMD’s first professional graphics card ever designed specifically for local acceleration of artificial intelligence (AI) on workstations. It’s not a traditional card for 3D rendering or CAD tasks (like the Radeon PRO W series); rather, it represents a new category of product: an “AI accelerator for desktop systems.” Although, in terms of hardware specifications, it’s essentially the same as the Radeon RX 9070 XT, as both cards use the same Navi 48 chip with 64 compute units (CU) and 4,096 stream processors. The Radeon AI Pro R9700 does come with twice as much memory (and, of course, a higher price).

GPU servers with hourly billing
Ideal for AI workloads, rendering, and high-performance computing, with pay-as-you-go pricing based on actual usage.

Officially, this GPU is aimed at AI developers for prototyping and testing models without relying on cloud services, researchers for conducting experiments with open-source models, and enterprises for deploying private AI assistants within their corporate networks.

Taking a Look Under the Hood

The specifications for this GPU are listed in the table below. We’ll also compare its features directly with the previous generation of NVIDIA GPUs, namely the RTX A5000 with 24GB of memory, as well as the Blackwell series, represented by the RTX PRO 4000 with 24GB of memory.

Parameter

AMD Radeon™ AI PRO R9700 32GB

NVIDIA RTX PRO 4000 Blackwell

NVIDIA RTX A5000 24GB

Release Date

July 2025 (OEM), October 2025 (retail)

Q1 2025 (March-May 2025)

April 2021

Architecture

RDNA™ 4 (Navi 48)

Blackwell (GB203/GB204)

Ampere (GA102)

Manufacturing Process

4 nm (TSMC)

5 nm (TSMC)

8 nm (Samsung)

Transistors

53.9 billion

26 billion

28.3 billion

Computing Units

4096 stream processors, 64 Compute Units

8960 CUDA Cores

8192 CUDA Cores

Specialized Accelerators

128 AI Accelerators (2nd generation), 64 Ray Accelerators (3rd generation)

280 Tensor Cores (5th generation), 70 RT Cores (4th generation)

256 Tensor Cores (3rd generation), 64 RT Cores

Video Memory

32 GB GDDR6 with 256-bit bus and 20 Gbps speed

24 GB GDDR7 with ECC, 192-bit bus and 28 Gbps speed

24 GB GDDR6 with ECC, 384-bit bus and 19.5 Gbps speed

Memory Bandwidth

640 GB/s

672–896 GB/s (depending on configuration)

768 GB/s

Additional Cache

64 MB Infinity Cache

None

None

Clock Rates

Base: 1660 MHz; Boost: up to 2920 MHz

Base: 975 MHz; Boost: 1950 MHz

Base: 1170 MHz; Boost: 1695 MHz

Performance

FP16: 95.7 TFLOPS; INT4: ~1531 TOPS

FP16/BF16: 140–160 TFLOPS; AI TOPS: 750–900 (FP8/INT8)

FP32: 27.8 TFLOPS; FP16: 111.2 TFLOPS

Power Consumption (TDP)

300 W

140 W

230 W

AI Efficiency

5.1 TOPS/Watt (INT4)

5.4–6.4 TOPS/Watt

0.8 TOPS/Watt (for older Tensor Cores)

Interface

PCIe 5.0 ×16

PCIe 5.0 ×16 (standard) / ×8 (SFF)

PCIe 4.0 ×16

Video Outputs

4× DisplayPort 2.1a + 1× HDMI 2.1b

4× DisplayPort 2.1

4× DisplayPort 1.4a

Form Factor

Full-size (267×111 mm)

Standard and SFF (Small Form Factor)

Two-slot, full-height

Cooling

Active (turbomotor fan)

Active (turbomotor fan)

Active (turbomotor fan)

AI Support

ROCm 6.0+, DirectML, Windows ML

TensorRT, Blackwell Transformer Engine

TensorRT

Target Applications

Local AI inference, training of 10B–30B models, generative AI with large datasets

Professional visualization, AI inference in compact systems

Classic 3D visualization, CAD/CAM, AI inference

Estimated Price

$1299

$1559

$1800–2200 (on the secondary market); new model: ~$2500

As can be seen, although AMD’s new GPU is built using a more advanced manufacturing process, it uses GDDR6 memory and has the highest power consumption among the three models. However, it also comes with 16 GB more video memory and a lower price point.

Testing the Card in Action

For testing, two servers were used: one with a more powerful configuration featuring an AMD Ryzen 9 7950X processor (4.5GHz, 16 cores), 128GB of DDR5 memory, and 1TB of NVMe SSD; the other was less powerful, equipped with a Core i9-9900K processor (5.0GHz, 8 cores), 64GB of DDR5 memory, and also 1TB of NVMe SSD. To be upfront, the first platform delivered better results, especially in tasks that relied heavily on the CPU or required quick loading of large files from the SSD. In GPU testing, the difference between the two platforms was less than 1%.

Unfortunately, we didn’t manage to take photos of the first build (which used EPYC processors); however, we will show you the second build. Both servers were assembled in a custom chassis developed by us, which can be stacked three high on a rack, occupying 3U of space. We can provide more details about these chassis in upcoming articles.

Here are photos of the actual graphics card itself, produced by Sapphire.

As well as a photo of it installed in a custom chassis.

As you can see, this is a dual-slot graphics card with cooling designed around a single fan/turbine, similar to the cooling systems used by Nvidia for their professional GPUs.

Now onto the tests

First, we need to get the card working properly. We used Ubuntu 24.04 as the operating system, as this card can only be properly configured on kernels version 6.13 or later; therefore, we required the mainline Linux distribution branch, which is not available for Ubuntu 22.04. In the end, we succeeded in getting the card running with a kernel version of 6.18.2 (we tested it towards the end of last year). As usual, we provided detailed instructions on how to do this, as well as a special installation script (also included in the documentation); you can simply copy that script, run it from the root account in the command line, and it will handle all the necessary settings for you. Our main goal was to get ROCm running on the card, but we also tested its performance in 3D rendering using the HIP framework.

If everything goes smoothly for you (just as it did for us), then the amd-smi command (does it remind you of anything?…) will provide you with similar results. The first screenshot shows the performance on an EPYC processor (which is indicated by the presence of that second AMD-related component in the screenshot), and the second screenshot shows the performance on an i9 processor. In both cases, we used ROCm version 7.1.1.

Subsequently, we had to further develop our AI test based on Ollama to ensure it could work with and output information from AMD GPUs. We ran the test using various models (in our case, DeepSeek R1 and gpt-oss:20b) and obtained the following results:

GPU

VRAM

Model

Tokens/sec (average)

Max Contexts

Load Time (sec) (average)

Generation Time (sec) (average)

Notes

AMD RADEON AI PRO R9700

32 GB

deepseek-r1:14b

53.52

80,000

6.74

50.50

 

AMD RADEON AI PRO R9700

32 GB

deepseek-r1:32b

26.29

36,000

8.11

92.89

 

AMD RADEON AI PRO R9700

32 GB

gpt-oss:20b

102.40

128,000

5.71

28.22

“Mixture of Experts” model

NVIDIA RTX A5000 (gen3)

24 GB

deepseek-r1:14b

53.15

48,000

9.15

49.11

 

NVIDIA RTX A5000 (gen3)

24 GB

deepseek-r1:32b

25.77

12,000

11.49

94.10

 

NVIDIA RTX A5000 (gen3)

24 GB

gpt-oss:20b

119.46

128,000

6.12

22.72

“Mixture of Experts” model

NVIDIA GeForce RTX 5090 (gen5)

32 GB

deepseek-r1:32b

65.38

32,000

3.02

39.35

 

As can be seen from the table, the performance of these GPUs is roughly comparable to that of the NVIDIA RTX A5000 (note that the RTX A5000 was not even running on a fast PCI bus). Moreover, more powerful GPUs from AMD’s “Green” series, using the same amount of video memory (even consumer-grade models), outperform the NVIDIA “Red” series by nearly three times.

A full comparison table of GPUs can be found here.

On the other hand, support for ROCm is starting to be quite promising. In some cases, it even outperforms CUDA; however, with CUDA, we still sometimes have to rely on nightly builds of PyTorch and similar tools.

And what about in other tasks?

Following our testing tradition, we’ll try rendering graphics and video within ComfyUI. Using the Z-image Turbo model, with a resolution of 1024x1024… and a quick test with the “bears” theme.

Photorealistic documentary photo inside a beaver lodge on a quiet lake at night: three beavers acting like IT engineers are assembling a 19-inch server into a small rack. Each beaver wears a bright yellow construction hard hat with a clean, sharp, perfectly readable HOSTKEY logo printed on the front (exact spelling: “HOSTKEY”, all caps), centered, high-contrast, crisp lettering, not distorted. One beaver holds the rack rails, another plugs RJ‑45 patch cables into a network switch, the third checks the front-panel status LEDs. Warm tungsten lamp light, cozy wooden interior, wet realistic fur texture with tiny water droplets, realistic wood grain, subtle steam from damp fur, lake reflections visible through a small window. A neat pile of cable ties, a small screwdriver, and a laptop showing a terminal on a wooden table. Cinematic but realistic lighting, shallow depth of field, 35mm documentary photography, f/2.0, ISO 800, crisp sharp focus on the beavers, helmets, and the server rack, high detail, natural colors, realistic reflections, no cartoon look, no CGI look.

The generation process takes 6–7 seconds, with a total time of 14–15 seconds even when the resolution is increased to HD. For comparison, the RTX PRO 2000 takes 26 seconds for the total process and 14 seconds just for the generation phase alone. Unfortunately, we haven’t conducted graphics generation tests on our other cards before.

For Kandinsky 5 Lite, using the “text in video” mode with the same theme, this mode puts a full load on the card, consuming between 280 and 300 watts of power.

In summary, it took 24 minutes. The NVIDIA RTX PRO 2000 also took 24 minutes. This means that when generating videos, the “red team” (those with the less efficient optimization methods) suffers a significant loss in performance. Previously, in the latest versions of ROCm branch 6, there was already a 30% regression in video generation performance, and it’s possible that this issue has carried over to branch 7 as well. In this case, however, NVIDIA’s optimization efforts are significantly better.

We’ll additionally test the “video image generation” process to further verify the results.

The same 24 minutes of testing time as compared to the NVIDIA RTX PRO 2000 as well.

And what about AI tasks?

As we recall, AMD positions the RADEON AI PRO R9700 for AI-related tasks. Moreover, AMD’s own technology stack is divided into different components, so it’s interesting to test the card in the same 3D rendering tasks within Blender. For rendering, AMD uses HIP (Heterogeneous Interface for Portability), which is compatible with the CUDA API. HIP provides basic functionality in Blender, but it doesn’t offer competitive performance compared to OptiX + RTX Core.

We used this test link https://opendata.blender.org/ for the evaluation. It was only possible to run this test on the LTS version of Blender 4.2.

In total, the card scored 2957 points. Looking at the results, it performs slightly better than the RX 6950 XT and is comparable to the NVIDIA RTX A4000. However, it lags significantly behind the consumer-grade Radeon RX 7900 XT from 2022. It’s worth noting that the RX 7900 XT has more RT cores (84 versus 64) and stream processors (5376 versus 4096), as well as a wider bus. Although it’s based on the previous RDNA3 generation, it’s better optimized for 3D graphics rendering.

Summary

The conclusion of this comparison is rather inconclusive. While NVIDIA offers a range of professional graphics cards that are clearly categorized based on memory capacity and core performance (such as the RTX PRO 2000/4000/6000 Blackwell series, with each model having its own distinct specifications), AMD’s offerings seem to fall short of meeting these clear distinctions.

On one hand, the libraries and applications designed for working with neural network models perform quite stably on this card; when using tools like Ollama or VLMM, as well as llama.cpp, we see performance that is comparable to NVIDIA’s A5000/RTX PRO 4000 Blackwell series cards. Additionally, this AMD card comes with a larger memory capacity (32GB versus NVIDIA’s 24GB) and is also more affordable. However, it laggs behind in terms of video generation and rendering capabilities.

Another potential issue arises from support for this card within operating systems: the necessary kernel and driver components will only be available in the LTS (Long-Term Support) versions of Ubuntu starting with version 26.04, this spring.

In summary: if you need a professional graphics card for tasks involving text model inference or image generation with ample memory capacity, but are not willing to pay a premium for NVIDIA’s mid-to-high-end models, then the RADEON AI PRO R9700 is a viable option. However, if you plan to render videos, 3D graphics, or perform further training on models, then you should take a closer look at the "green" camp.

GPU servers with hourly billing
Ideal for AI workloads, rendering, and high-performance computing, with pay-as-you-go pricing based on actual usage.

Other articles

11.02.2026

What Irritates Colleagues but No One Talks About: 8 Bad Habits in IT

You can be a highly skilled professional and still create difficulties for your colleagues. Here's a breakdown of the eight most common scenarios from real-life IT team dynamics.

20.01.2026

From On-Premises to Cloud and Back: Migration Case Studies

Migrations are a two-way street, and while everyone is usually "right," their reasons differ. This article delves into case studies with concrete data, explores hybrid architectures, and highlights common pitfalls that turn cost savings into losses.

16.01.2026

Set Up Your Own Mumble Server for Stable Voice Calls Without Blockouts or Subscriptions

Mumble is simple, fast, and reliable for small groups. Here’s how to deploy it on Ubuntu with Docker and keep it backed up.

12.01.2026

NVIDIA RTX PRO 2000 Blackwell: What’s this “junior” GPU in the new family of professional graphics cards capable of?

Testing the RTX PRO 2000 (70W power consumption, 16GB GDDR7 memory) with Ollama, ComfyUI, and Blender. We’ve checked what this card is actually capable of and whether it’s worth the price.

29.12.2025

When Hybrid Architecture Outperforms Cloud and Dedicated Servers

Is your service experiencing performance drops during peak periods? Hybrid architecture helps stabilize loads and avoid unnecessary expenses. Find out when this approach works best.

Upload