11.02.2026

Benchmarking the Radeon AI Pro R9700: AMD’s “Red Team” Buys into On-Device AI

Author: Alexander Kazantsev, Head of Documentation and Content at HOSTKEY

Testing NVIDIA graphics cards one after another can get tedious, especially since the differences between recent generations mainly lie in the power of the Blackwell series processors, memory capacity, and bus width. However, it’s more interesting to see what competitors have to offer, especially when they boldly label these features as “AI.”

Today, we’ll put the AMD Radeon AI Pro R9700 to the test with 32 gigabytes of video memory. This is AMD’s first professional graphics card ever designed specifically for local acceleration of artificial intelligence (AI) on workstations. It’s not a traditional card for 3D rendering or CAD tasks (like the Radeon PRO W series); rather, it represents a new category of product: an “AI accelerator for desktop systems.” Although, in terms of hardware specifications, it’s essentially the same as the Radeon RX 9070 XT, as both cards use the same Navi 48 chip with 64 compute units (CU) and 4,096 stream processors. The Radeon AI Pro R9700 does come with twice as much memory (and, of course, a higher price).

GPU servers with hourly billing

Ideal for AI workloads, rendering, and high-performance computing, with pay-as-you-go pricing based on actual usage.

View

Officially, this GPU is aimed at AI developers for prototyping and testing models without relying on cloud services, researchers for conducting experiments with open-source models, and enterprises for deploying private AI assistants within their corporate networks.

Taking a Look Under the Hood

The specifications for this GPU are listed in the table below. We’ll also compare its features directly with the previous generation of NVIDIA GPUs, namely the RTX A5000 with 24GB of memory, as well as the Blackwell series, represented by the RTX PRO 4000 with 24GB of memory.

Parameter	AMD Radeon™ AI PRO R9700 32GB	NVIDIA RTX PRO 4000 Blackwell	NVIDIA RTX A5000 24GB
Release Date	July 2025 (OEM), October 2025 (retail)	Q1 2025 (March-May 2025)	April 2021
Architecture	RDNA™ 4 (Navi 48)	Blackwell (GB203/GB204)	Ampere (GA102)
Manufacturing Process	4 nm (TSMC)	5 nm (TSMC)	8 nm (Samsung)
Transistors	53.9 billion	26 billion	28.3 billion
Computing Units	4096 stream processors, 64 Compute Units	8960 CUDA Cores	8192 CUDA Cores
Specialized Accelerators	128 AI Accelerators (2nd generation), 64 Ray Accelerators (3rd generation)	280 Tensor Cores (5th generation), 70 RT Cores (4th generation)	256 Tensor Cores (3rd generation), 64 RT Cores
Video Memory	32 GB GDDR6 with 256-bit bus and 20 Gbps speed	24 GB GDDR7 with ECC, 192-bit bus and 28 Gbps speed	24 GB GDDR6 with ECC, 384-bit bus and 19.5 Gbps speed
Memory Bandwidth	640 GB/s	672–896 GB/s (depending on configuration)	768 GB/s
Additional Cache	64 MB Infinity Cache	None	None
Clock Rates	Base: 1660 MHz; Boost: up to 2920 MHz	Base: 975 MHz; Boost: 1950 MHz	Base: 1170 MHz; Boost: 1695 MHz
Performance	FP16: 95.7 TFLOPS; INT4: ~1531 TOPS	FP16/BF16: 140–160 TFLOPS; AI TOPS: 750–900 (FP8/INT8)	FP32: 27.8 TFLOPS; FP16: 111.2 TFLOPS
Power Consumption (TDP)	300 W	140 W	230 W
AI Efficiency	5.1 TOPS/Watt (INT4)	5.4–6.4 TOPS/Watt	0.8 TOPS/Watt (for older Tensor Cores)
Interface	PCIe 5.0 ×16	PCIe 5.0 ×16 (standard) / ×8 (SFF)	PCIe 4.0 ×16
Video Outputs	4× DisplayPort 2.1a + 1× HDMI 2.1b	4× DisplayPort 2.1	4× DisplayPort 1.4a
Form Factor	Full-size (267×111 mm)	Standard and SFF (Small Form Factor)	Two-slot, full-height
Cooling	Active (turbomotor fan)	Active (turbomotor fan)	Active (turbomotor fan)
AI Support	ROCm 6.0+, DirectML, Windows ML	TensorRT, Blackwell Transformer Engine	TensorRT
Target Applications	Local AI inference, training of 10B–30B models, generative AI with large datasets	Professional visualization, AI inference in compact systems	Classic 3D visualization, CAD/CAM, AI inference
Estimated Price	$1299	$1559	$1800–2200 (on the secondary market); new model: ~$2500

As can be seen, although AMD’s new GPU is built using a more advanced manufacturing process, it uses GDDR6 memory and has the highest power consumption among the three models. However, it also comes with 16 GB more video memory and a lower price point.

Testing the Card in Action

For testing, two servers were used: one with a more powerful configuration featuring an AMD Ryzen 9 7950X processor (4.5GHz, 16 cores), 128GB of DDR5 memory, and 1TB of NVMe SSD; the other was less powerful, equipped with a Core i9-9900K processor (5.0GHz, 8 cores), 64GB of DDR5 memory, and also 1TB of NVMe SSD. To be upfront, the first platform delivered better results, especially in tasks that relied heavily on the CPU or required quick loading of large files from the SSD. In GPU testing, the difference between the two platforms was less than 1%.

Unfortunately, we didn’t manage to take photos of the first build (which used EPYC processors); however, we will show you the second build. Both servers were assembled in a custom chassis developed by us, which can be stacked three high on a rack, occupying 3U of space. We can provide more details about these chassis in upcoming articles.

Here are photos of the actual graphics card itself, produced by Sapphire.

As well as a photo of it installed in a custom chassis.

As you can see, this is a dual-slot graphics card with cooling designed around a single fan/turbine, similar to the cooling systems used by Nvidia for their professional GPUs.

Now onto the tests

First, we need to get the card working properly. We used Ubuntu 24.04 as the operating system, as this card can only be properly configured on kernels version 6.13 or later; therefore, we required the mainline Linux distribution branch, which is not available for Ubuntu 22.04. In the end, we succeeded in getting the card running with a kernel version of 6.18.2 (we tested it towards the end of last year). As usual, we provided detailed instructions on how to do this, as well as a special installation script (also included in the documentation); you can simply copy that script, run it from the root account in the command line, and it will handle all the necessary settings for you. Our main goal was to get ROCm running on the card, but we also tested its performance in 3D rendering using the HIP framework.

If everything goes smoothly for you (just as it did for us), then the amd-smi command (does it remind you of anything?…) will provide you with similar results. The first screenshot shows the performance on an EPYC processor (which is indicated by the presence of that second AMD-related component in the screenshot), and the second screenshot shows the performance on an i9 processor. In both cases, we used ROCm version 7.1.1.

Subsequently, we had to further develop our AI test based on Ollama to ensure it could work with and output information from AMD GPUs. We ran the test using various models (in our case, DeepSeek R1 and gpt-oss:20b) and obtained the following results:

GPU	VRAM	Model	Tokens/sec (average)	Max Contexts	Load Time (sec) (average)	Generation Time (sec) (average)	Notes
AMD RADEON AI PRO R9700	32 GB	deepseek-r1:14b	53.52	80,000	6.74	50.50
AMD RADEON AI PRO R9700	32 GB	deepseek-r1:32b	26.29	36,000	8.11	92.89
AMD RADEON AI PRO R9700	32 GB	gpt-oss:20b	102.40	128,000	5.71	28.22	“Mixture of Experts” model
NVIDIA RTX A5000 (gen3)	24 GB	deepseek-r1:14b	53.15	48,000	9.15	49.11
NVIDIA RTX A5000 (gen3)	24 GB	deepseek-r1:32b	25.77	12,000	11.49	94.10
NVIDIA RTX A5000 (gen3)	24 GB	gpt-oss:20b	119.46	128,000	6.12	22.72	“Mixture of Experts” model
NVIDIA GeForce RTX 5090 (gen5)	32 GB	deepseek-r1:32b	65.38	32,000	3.02	39.35

As can be seen from the table, the performance of these GPUs is roughly comparable to that of the NVIDIA RTX A5000 (note that the RTX A5000 was not even running on a fast PCI bus). Moreover, more powerful GPUs from AMD’s “Green” series, using the same amount of video memory (even consumer-grade models), outperform the NVIDIA “Red” series by nearly three times.

A full comparison table of GPUs can be found here.

On the other hand, support for ROCm is starting to be quite promising. In some cases, it even outperforms CUDA; however, with CUDA, we still sometimes have to rely on nightly builds of PyTorch and similar tools.

And what about in other tasks?

Following our testing tradition, we’ll try rendering graphics and video within ComfyUI. Using the Z-image Turbo model, with a resolution of 1024x1024… and a quick test with the “bears” theme.

Photorealistic documentary photo inside a beaver lodge on a quiet lake at night: three beavers acting like IT engineers are assembling a 19-inch server into a small rack. Each beaver wears a bright yellow construction hard hat with a clean, sharp, perfectly readable HOSTKEY logo printed on the front (exact spelling: “HOSTKEY”, all caps), centered, high-contrast, crisp lettering, not distorted. One beaver holds the rack rails, another plugs RJ‑45 patch cables into a network switch, the third checks the front-panel status LEDs. Warm tungsten lamp light, cozy wooden interior, wet realistic fur texture with tiny water droplets, realistic wood grain, subtle steam from damp fur, lake reflections visible through a small window. A neat pile of cable ties, a small screwdriver, and a laptop showing a terminal on a wooden table. Cinematic but realistic lighting, shallow depth of field, 35mm documentary photography, f/2.0, ISO 800, crisp sharp focus on the beavers, helmets, and the server rack, high detail, natural colors, realistic reflections, no cartoon look, no CGI look.

The generation process takes 6–7 seconds, with a total time of 14–15 seconds even when the resolution is increased to HD. For comparison, the RTX PRO 2000 takes 26 seconds for the total process and 14 seconds just for the generation phase alone. Unfortunately, we haven’t conducted graphics generation tests on our other cards before.

For Kandinsky 5 Lite, using the “text in video” mode with the same theme, this mode puts a full load on the card, consuming between 280 and 300 watts of power.

In summary, it took 24 minutes. The NVIDIA RTX PRO 2000 also took 24 minutes. This means that when generating videos, the “red team” (those with the less efficient optimization methods) suffers a significant loss in performance. Previously, in the latest versions of ROCm branch 6, there was already a 30% regression in video generation performance, and it’s possible that this issue has carried over to branch 7 as well. In this case, however, NVIDIA’s optimization efforts are significantly better.

We’ll additionally test the “video image generation” process to further verify the results.

The same 24 minutes of testing time as compared to the NVIDIA RTX PRO 2000 as well.

And what about AI tasks?

As we recall, AMD positions the RADEON AI PRO R9700 for AI-related tasks. Moreover, AMD’s own technology stack is divided into different components, so it’s interesting to test the card in the same 3D rendering tasks within Blender. For rendering, AMD uses HIP (Heterogeneous Interface for Portability), which is compatible with the CUDA API. HIP provides basic functionality in Blender, but it doesn’t offer competitive performance compared to OptiX + RTX Core.

We used this test link https://opendata.blender.org/ for the evaluation. It was only possible to run this test on the LTS version of Blender 4.2.

In total, the card scored 2957 points. Looking at the results, it performs slightly better than the RX 6950 XT and is comparable to the NVIDIA RTX A4000. However, it lags significantly behind the consumer-grade Radeon RX 7900 XT from 2022. It’s worth noting that the RX 7900 XT has more RT cores (84 versus 64) and stream processors (5376 versus 4096), as well as a wider bus. Although it’s based on the previous RDNA3 generation, it’s better optimized for 3D graphics rendering.

Summary

The conclusion of this comparison is rather inconclusive. While NVIDIA offers a range of professional graphics cards that are clearly categorized based on memory capacity and core performance (such as the RTX PRO 2000/4000/6000 Blackwell series, with each model having its own distinct specifications), AMD’s offerings seem to fall short of meeting these clear distinctions.

On one hand, the libraries and applications designed for working with neural network models perform quite stably on this card; when using tools like Ollama or VLMM, as well as llama.cpp, we see performance that is comparable to NVIDIA’s A5000/RTX PRO 4000 Blackwell series cards. Additionally, this AMD card comes with a larger memory capacity (32GB versus NVIDIA’s 24GB) and is also more affordable. However, it laggs behind in terms of video generation and rendering capabilities.

Another potential issue arises from support for this card within operating systems: the necessary kernel and driver components will only be available in the LTS (Long-Term Support) versions of Ubuntu starting with version 26.04, this spring.

In summary: if you need a professional graphics card for tasks involving text model inference or image generation with ample memory capacity, but are not willing to pay a premium for NVIDIA’s mid-to-high-end models, then the RADEON AI PRO R9700 is a viable option. However, if you plan to render videos, 3D graphics, or perform further training on models, then you should take a closer look at the "green" camp.