A server equipped with four AMD Radeon RX 7900 XTX GPUs, featuring RDNA 3 architecture and 24 GB of video memory per card, delivers exceptional performance for artificial intelligence, 3D rendering, and big data processing. This powerful yet cost-effective solution is ideal for handling business tasks of any complexity.
The AMD Radeon RX 7900 combines cutting-edge technology with high performance. It offers excellent power efficiency, ample video memory, and support for advanced features like second-generation Ray Tracing and Infinity Cache. This makes it an ideal choice for professional workloads, including 3D rendering, artificial intelligence (AI), and scientific computing.
Powered by RDNA 3 architecture and equipped with 24 GB GDDR6 memory, the card effortlessly handles demanding tasks such as 4K gaming and complex 3D graphics.
The graphics card features 6,144 stream processors, delivering exceptional performance for both gaming and professional applications.
Ample capacity for handling complex calculations.
Enhanced memory bandwidth for improved performance.
Delivers realistic graphics and enhanced visualization.
Optimized for various professional applications.
Lower power consumption reduces operating costs.
Perfect for multi-GPU server configurations.
High performance at a more competitive price than similar NVIDIA solutions. Using multiple AMD GPUs in a single server offers a cost-effective alternative to high-end NVIDIA cards.
AMD Radeon RX 7900 XTX | Nvidia RTX 4090 | |
Llama 3.3 70B (2K context, 54 Gb VRAM). Q4 in Ollama | Response: 12 token/s | Response: 17 token/s |
Gemma 2 27B (2K context - 28 Gb VRAM). Q4 in Ollama | Response: 32 token/s | Response: 40 token/s |
Gemma 2 27B (8K context — 41 Gb VRAM). Q4 in Ollama | Response: 33 token/s | Response: 42 token/s |
Phi4 14B (12 Gb VRAM) 2K context. Q4 in Ollama | Response: 48 token/s | Response: 76 token/s |
Qwen25-32b-Instruct. Fp16 in vLLM | End-to-End Request Latency (30 workers): 10 s | End-to-End Request Latency (30 workers): 10 s |
Qwen25-32b-Instruct. Fp16 in vLLM | Combined Token Throughput (30 workers): 710 token/s | Combined Token Throughput (30 workers): 750 token/s |
Qwen25-32b-Instruct. Fp16 in vLLM | Time to First Token (30 workers): 1.5 s | Time to First Token (30 workers): 2.3 s |
Qwen25-32b-Instruct. Fp16 in vLLM | Inter-Token Latency (30 workers): 0.037s | Inter-Token Latency (30 workers): 0.037s |
Qwen25-32b-Instruct. Fp16 in vLLM | Request per Second (30 workers): 2.1 request/s | Request per Second (30 workers): 2.3 request/s |
Qwen25-32b-Instruct. Fp16 in vLLM | Tokens per Second (30 workers): 27 tokens/s | Tokens per Second (30 workers): 27.5 tokens/s |
Our Services
GPU servers for data science
e-Commerce hosting
Finance and FinTech
Private cloud
Rendering, 3D Design and visualization
Managed colocation
GPU servers for Deep Learning
Wide range of pre-configured servers with instant delivery and sale
The card does not use CUDA cores, as CUDA is NVIDIA's proprietary technology. Instead, it features 6,144 Stream Processors, which serve a similar function in AMD's GPU architecture.
Yes, the card supports OpenCL, ROCm and vLLM, which can be used for training, inference, chatbots and video recognition. The card is also compatible with popular machine learning frameworks like PyTorch and TensorFlow. Its performance with FP16 models is comparable to the NVIDIA RTX 4090, though it does not yet support FP8 models. For certain workloads, especially in multi-GPU configurations, the RX 7900 XTX is an excellent option, offering strong performance at a lower cost than NVIDIA alternatives.
The primary limitation is the lack of full CUDA support, which can make some AI frameworks and software less compatible out of the box. This may require software adaptation or the use of an emulator like ZLUDA to run CUDA-based applications on AMD hardware.