03.06.2022

NVIDIA A5500: real power or just a facelift

server one

One of the new products featured at the GTC 2022 conference was the RTX A5500 graphics card, which expands the range of NVIDIA professional graphics accelerators. It is based on Ampere architecture with second-generation RT cores and third-generation tensor cores. It features 24 GB of GDDR6 memory with ECC error correction and 768 GB/s peak bandwidth.

The 8nm RTX A5500 graphics chip includes 10,240 CUDA cores, 80 RT cores and 320 tensor cores. NVIDIA notes that the performance in single precision operations (FP32) is 34.1 Tflops, and in half precision operations (FP16) - 272.8 Tflops.

All this, as they say, is on paper. Let's see what the video card is really capable of, made possible by the fact that Hostkey has recently been able to build a machine equipped with the card.

HOSTKEY
Rent GPU servers with instant deployment or a server with a custom configuration with professional-grade NVIDIA RTX 5500 / 5000 / A4000 cards. VPS with dedicated GPU cards are also available . The GPU card is dedicated to the VM and cannot be used by other clients. GPU performance in virtual machines matches GPU performance in dedicated servers.

Encoding

Comparing the RTX A5000 and RTX A4000, we were convinced that neither an increase in CPU frequency nor the amount of video memory had much effect on the video cards' encoding block performance. Readers also rightly noticed that we used an automatic quantization setting (hence the quality of the resulting video) instead of the ready-made h264 codec preset, and also we skipped the encoding necessary for 60 fps streaming.

Let's repeat the same tests on RTX A5500 and first of all we will run 1080p stream encoding at 30 fps. If we compare the A5000 results, then it (just like the A4000) managed only 14 streams.

The A5500 performed better, and at 14 threads it clearly had a safety margin (NVIDIA promises up to 16 threads). At the same time, the video card consumed 5 W less power and had a lower video core temperature (+35° C vs. +47° C for A5000), though it used 500 MB more video memory.

Output from nvidia-smi dmon -s pucm

gpu pwr gtemp mtemp sm mem enc dec mclk pclk fb bar1
Idx W C C % % % % MHz MHz MB MB
0 92 35 - 13 3 100 0 7600 1890 4141 32
gpu Idx 0
pwr W 92
gtemp C 35
mtemp C -
sm % 13
mem % 3
enc % 100
dec % 0
mclk MHz 7600
pclk MHz 1890
fb MB 4141
bar1 MB 32

Ffmpeg output gives us the following:

frame = 1051 fps = 32 q = 33.0 size = 9472 kB time = 00:00:34.93 bitrate = 2221.2 kbits/s speed = 1.07x

The adapter obviously cannot handle 16 video streams:

gpu pwr gtemp mtemp sm mem enc dec mclk pclk fb bar1
Idx W C C % % % % MHz MHz MB MB
0 96 44 - 13 4 100 0 7600 1905 4732 32
gpu Idx 0
pwr W 96
gtemp C 44
mtemp C -
sm % 13
mem % 4
enc % 100
dec % 0
mclk MHz 7600
pclk MHz 1905
fb MB 4732
bar1 MB 32

frame = 901 fps =28 q= 26.0 size = 7680 kB time = 00:00:29.93 bitrate = 2101.8 kbits/s speed = 0.917x

It started experiencing frame loss, and the picture was filled with artifacts (defects), as the codec could not keep up and automatically degraded the quality (parameter q jumping from 26 to 50).

Let's try to record video in high quality. We set the parameters corresponding to the high profile for the h264 codec: it is considered basic for digital broadcasting and video on optical media, especially for high definition television (it is also used for Blu-ray video discs and DVB HDTV broadcasting).

Once again running 14 threads, the load on the video card increases, but the card can handle it:

gpu pwr gtemp mtemp sm mem enc dec mclk pclk fb bar1
Idx W C C % % % % MHz MHz MB MB
0 95 43 - 13 4 100 0 7600 1890 4141 32
gpu Idx 0
pwr W 95
gtemp C 43
mtemp C -
sm % 13
mem % 4
enc % 100
dec % 0
mclk MHz 7600
pclk MHz 1890
fb MB 4141
bar1 MB 32

Ffmpeg output:

frame = 968 fps = 32 q = 23.0 size = 7680 kB time = 00:00:32.16 bitrate = 1955.9 kbits/s speed = 1.07x

Let’s try 4K at 30 fps. The card can handle three streams in high profile without any problems:

frame = 257 fps = 37 q = 33.0 size = 2304 kB time = 00:00:08.46 bitrate = 2229.3 kbits/s speed = 1.2x

On four streams, it drops slightly (as you remember, the A5000 with four streams and the automatic quality setting was able to give only 25-26 frames with artifacts):

frame = 985 fps = 30 q = 37.0 size = 7424 kB time = 00:00:32.73 bitrate = 1858.0 kbits/s speed = 0.995x

The hardware was as follows:

gpu pwr gtemp mtemp sm mem enc dec mclk pclk fb bar1
Idx W C C % % % % MHz MHz MB MB
0 89 32 - 9 4 100 0 7600 1920 1659 11
gpu Idx 0
pwr W 89
gtemp C 32
mtemp C -
sm % 9
mem % 4
enc % 100
dec % 0
mclk MHz 7600
pclk MHz 1920
fb MB 1659
bar1 MB 11

In fact, the video card runs at a higher frequency than when encoding video in FullHD, but its main cores are not stressed by the load (the chip remains cold, as well as the video memory).

Streaming 4K at 60 frames per second dropped to two streams as expected, but we didn’t use a cartoon, but rather a gameplay recording from Doom Eternal, which caused some problems for the hardware decoder. A5500 handled it, but it was pushed to the limit, and the worst thing was that encoding in AV1 is not available in hardware, and when broadcasting via VLC with Ubuntu 20.04 we failed to get 60 fps as the stream had dropped down to 30 fps. We created a workaround from ffmpeg and the broadcast server:

frame = 240 fps = 61 q = 32.0 size = 2304 kB time = 00:00:09.48 bitrate = 3991.0 kbits/s speed = 1.03x

Conclusion: the encoders in the RTX A5500 have been improved, and under equal conditions it outperforms the A5000, rendering a subjectively better picture and working at lower frequencies.

CUDA/RT/Tensor Cores

And what about the rest of the units? We compared the RTX A5500 with the A5000 in several other tests (you can read more about the exact methods in a previous article):

  1. A test of mining capabilities (with PhoenixMiner).
  2. A test of machine learning capabilities. Here, we trained a neural network on each of the cards to determine whether a cat or a dog is depicted in the photograph, using 100 epochs for this purpose.
  3. V-Ray 5 Benchmark test for rendering both on CPU + GPU (CUDA test) and purely on a GPU (RTX test).
  4. LuxMark test in three different scenes, checking the speed in OpenCL on the GPU.
  5. Test Blender in different scenes in OptiX mode using the full capabilities of the RTX.

Summary Table

NVIDIA GPU Mining speed, MH ML test 100 epoch V-Ray 5 Benchmark (vpaths/vrays) LuxMark Blender
RTX A5000 86.66 9 min. 9s V-Ray GPU CUDA — 1381 vpaths

V-Ray GPU RTX — 2288 vrays
Lux ball — 74 795
Hotel — 15 794
Mic — 45 640
Monster — 2312
Junkshop — 1331
Classroom — 1148
RTX A5500 87.319 8 min. 59s V-Ray GPU CUDA — 1594

vpaths V-Ray GPU RTX — 2613 vrays
Lux ball — 78 554
Hotel — 16 219
Mic — 48 832
Monster — 2468
Junkshop — 1388
Classroom — 1223
NVIDIA GPU RTX A5000 RTX A5500
Mining speed, MH 86.66 87.319
ML test 100 epoch 9 min. 9s 8 min. 59s
V-Ray 5 Benchmark (vpaths/vrays) V-Ray GPU CUDA — 1381 vpaths

V-Ray GPU RTX — 2288 vrays
V-Ray GPU CUDA — 1594

vpaths V-Ray GPU RTX — 2613 vrays
LuxMark Lux ball — 74 795
Hotel — 15 794
Mic — 45 640
Lux ball — 78 554
Hotel — 16 219
Mic — 48 832
Blender Monster — 2312
Junkshop — 1331
Classroom — 1148
Monster — 2468
Junkshop — 1388
Classroom — 1223

RTX A5500 shows better performance in rendering, but here everything depends on optimization: in V-Ray 5 we have a 13-14% gap, in LuxMark - 5-7%, with similar figures of 5-7% in Blender. Taking into account a margin of error of 1-2% percent depending on the run, the final performance gain is not very impressive.

The A5500 is at least 15% faster in machine learning, but for miners it will be an unpleasant surprise to find almost the same hash rate on both cards. Note, however, that this solution is positioned by the manufacturer for professionals in graphics and neural networks.

Concluions

Alas, a miracle did not happen, and the actual performance increase is 5-10% depending on the task, and in the cases of mining and encoding, there was nary an increase at all. On the plus side, we saw lower power consumption, better cooling due to the lower heat dissipation needs of the video chip, as well as the larger amount of video memory, which should have a positive effect on intensive tasks. Whether or not it is worth the money is up to the buyer, and you can order a dedicated server with an NVIDIA RTX A5500 from us if you wish to check it out yourself.

Rent GPU servers with instant deployment or a server with a custom configuration with professional-grade NVIDIA RTX 5500 / 5000 / A4000 cards. VPS with dedicated GPU cards are also available . The GPU card is dedicated to the VM and cannot be used by other clients. GPU performance in virtual machines matches GPU performance in dedicated servers.

Other articles

20.06.2022

Multithreaded encoding: Pay twice as much or go for built-in?

Will we be able to multiply the performance with a professional video card, which costs twice as much? Let's check it out.

03.06.2022

How to choose a web hosting service

To choose the best hosting solution for your needs, you should assess several parameters: availability, reliability and service security, as well as the quality of support.

22.05.2022

Integrating FreeIPA with Active Directory

We're going to tell you how we integrated FreeIPA with Active Directory to manage office computers with Windows and Cisco Systems equipment.

22.05.2022

Apache Guacamole and its interaction with the API: the real case of using oVirt

We're going to tell you how to remotely control Dell equipment using the VNC console built into DRAC.

09.05.2022

10 simple steps: migrating from CentOS 8 to RockyLinux or AlmaLinux

A step-by-step guide on how to switch to RockyLinux or AlmaLinux - popular free distributions that are binary compatible with RedHat Enterprise Linux (RHEL).

HOSTKEY Dedicated servers and cloud solutions Pre-configured and custom dedicated servers. AMD, Intel, GPU cards, Free DDoS protection amd 1Gbps unmetered port 30
4.3 67 67
Upload