Encoding
Comparing the RTX A5000 and RTX A4000, we were convinced that neither an increase in CPU frequency nor the amount of video memory had much effect on the video cards' encoding block performance. Readers also rightly noticed that we used an automatic quantization setting (hence the quality of the resulting video) instead of the ready-made h264 codec preset, and also we skipped the encoding necessary for 60 fps streaming.
Let's repeat the same tests on RTX A5500 and first of all we will run 1080p stream encoding at 30 fps. If we compare the A5000 results, then it (just like the A4000) managed only 14 streams.
The A5500 performed better, and at 14 threads it clearly had a safety margin (NVIDIA promises up to 16 threads). At the same time, the video card consumed 5 W less power and had a lower video core temperature (+35° C vs. +47° C for A5000), though it used 500 MB more video memory.
Output from nvidia-smi dmon -s pucm
gpu | pwr | gtemp | mtemp | sm | mem | enc | dec | mclk | pclk | fb | bar1 |
---|---|---|---|---|---|---|---|---|---|---|---|
Idx | W | C | C | % | % | % | % | MHz | MHz | MB | MB |
0 | 92 | 35 | - | 13 | 3 | 100 | 0 | 7600 | 1890 | 4141 | 32 |
gpu | Idx | 0 |
pwr | W | 92 |
gtemp | C | 35 |
mtemp | C | - |
sm | % | 13 |
mem | % | 3 |
enc | % | 100 |
dec | % | 0 |
mclk | MHz | 7600 |
pclk | MHz | 1890 |
fb | MB | 4141 |
bar1 | MB | 32 |
Ffmpeg output gives us the following:
frame = 1051 fps = 32 q = 33.0 size = 9472 kB time = 00:00:34.93 bitrate = 2221.2 kbits/s speed = 1.07x
The adapter obviously cannot handle 16 video streams:
gpu | pwr | gtemp | mtemp | sm | mem | enc | dec | mclk | pclk | fb | bar1 |
---|---|---|---|---|---|---|---|---|---|---|---|
Idx | W | C | C | % | % | % | % | MHz | MHz | MB | MB |
0 | 96 | 44 | - | 13 | 4 | 100 | 0 | 7600 | 1905 | 4732 | 32 |
gpu | Idx | 0 |
pwr | W | 96 |
gtemp | C | 44 |
mtemp | C | - |
sm | % | 13 |
mem | % | 4 |
enc | % | 100 |
dec | % | 0 |
mclk | MHz | 7600 |
pclk | MHz | 1905 |
fb | MB | 4732 |
bar1 | MB | 32 |
frame = 901 fps =28 q= 26.0 size = 7680 kB time = 00:00:29.93 bitrate = 2101.8 kbits/s speed = 0.917x
It started experiencing frame loss, and the picture was filled with artifacts (defects), as the codec could not keep up and automatically degraded the quality (parameter q jumping from 26 to 50).
Let's try to record video in high quality. We set the parameters corresponding to the high profile for the h264 codec: it is considered basic for digital broadcasting and video on optical media, especially for high definition television (it is also used for Blu-ray video discs and DVB HDTV broadcasting).
Once again running 14 threads, the load on the video card increases, but the card can handle it:
gpu | pwr | gtemp | mtemp | sm | mem | enc | dec | mclk | pclk | fb | bar1 |
---|---|---|---|---|---|---|---|---|---|---|---|
Idx | W | C | C | % | % | % | % | MHz | MHz | MB | MB |
0 | 95 | 43 | - | 13 | 4 | 100 | 0 | 7600 | 1890 | 4141 | 32 |
gpu | Idx | 0 |
pwr | W | 95 |
gtemp | C | 43 |
mtemp | C | - |
sm | % | 13 |
mem | % | 4 |
enc | % | 100 |
dec | % | 0 |
mclk | MHz | 7600 |
pclk | MHz | 1890 |
fb | MB | 4141 |
bar1 | MB | 32 |
Ffmpeg output:
frame = 968 fps = 32 q = 23.0 size = 7680 kB time = 00:00:32.16 bitrate = 1955.9 kbits/s speed = 1.07x
Let’s try 4K at 30 fps. The card can handle three streams in high profile without any problems:
frame = 257 fps = 37 q = 33.0 size = 2304 kB time = 00:00:08.46 bitrate = 2229.3 kbits/s speed = 1.2x
On four streams, it drops slightly (as you remember, the A5000 with four streams and the automatic quality setting was able to give only 25-26 frames with artifacts):
frame = 985 fps = 30 q = 37.0 size = 7424 kB time = 00:00:32.73 bitrate = 1858.0 kbits/s speed = 0.995x
The hardware was as follows:
gpu | pwr | gtemp | mtemp | sm | mem | enc | dec | mclk | pclk | fb | bar1 |
---|---|---|---|---|---|---|---|---|---|---|---|
Idx | W | C | C | % | % | % | % | MHz | MHz | MB | MB |
0 | 89 | 32 | - | 9 | 4 | 100 | 0 | 7600 | 1920 | 1659 | 11 |
gpu | Idx | 0 |
pwr | W | 89 |
gtemp | C | 32 |
mtemp | C | - |
sm | % | 9 |
mem | % | 4 |
enc | % | 100 |
dec | % | 0 |
mclk | MHz | 7600 |
pclk | MHz | 1920 |
fb | MB | 1659 |
bar1 | MB | 11 |
In fact, the video card runs at a higher frequency than when encoding video in FullHD, but its main cores are not stressed by the load (the chip remains cold, as well as the video memory).
Streaming 4K at 60 frames per second dropped to two streams as expected, but we didn’t use a cartoon, but rather a gameplay recording from Doom Eternal, which caused some problems for the hardware decoder. A5500 handled it, but it was pushed to the limit, and the worst thing was that encoding in AV1 is not available in hardware, and when broadcasting via VLC with Ubuntu 20.04 we failed to get 60 fps as the stream had dropped down to 30 fps. We created a workaround from ffmpeg and the broadcast server:
frame = 240 fps = 61 q = 32.0 size = 2304 kB time = 00:00:09.48 bitrate = 3991.0 kbits/s speed = 1.03x
Conclusion: the encoders in the RTX A5500 have been improved, and under equal conditions it outperforms the A5000, rendering a subjectively better picture and working at lower frequencies.
CUDA/RT/Tensor Cores
And what about the rest of the units? We compared the RTX A5500 with the A5000 in several other tests (you can read more about the exact methods in a previous article):
- A test of mining capabilities (with PhoenixMiner).
- A test of machine learning capabilities. Here, we trained a neural network on each of the cards to determine whether a cat or a dog is depicted in the photograph, using 100 epochs for this purpose.
- V-Ray 5 Benchmark test for rendering both on CPU + GPU (CUDA test) and purely on a GPU (RTX test).
- LuxMark test in three different scenes, checking the speed in OpenCL on the GPU.
- Test Blender in different scenes in OptiX mode using the full capabilities of the RTX.
Summary Table
NVIDIA GPU | Mining speed, MH | ML test 100 epoch | V-Ray 5 Benchmark (vpaths/vrays) | LuxMark | Blender |
---|---|---|---|---|---|
RTX A5000 | 86.66 | 9 min. 9s | V-Ray GPU CUDA — 1381 vpaths V-Ray GPU RTX — 2288 vrays |
Lux ball — 74 795 Hotel — 15 794 Mic — 45 640 |
Monster — 2312 Junkshop — 1331 Classroom — 1148 |
RTX A5500 | 87.319 | 8 min. 59s | V-Ray GPU CUDA — 1594 vpaths V-Ray GPU RTX — 2613 vrays |
Lux ball — 78 554 Hotel — 16 219 Mic — 48 832 |
Monster — 2468 Junkshop — 1388 Classroom — 1223 |
NVIDIA GPU | RTX A5000 | RTX A5500 |
Mining speed, MH | 86.66 | 87.319 |
ML test 100 epoch | 9 min. 9s | 8 min. 59s |
V-Ray 5 Benchmark (vpaths/vrays) | V-Ray GPU CUDA — 1381 vpaths V-Ray GPU RTX — 2288 vrays |
V-Ray GPU CUDA — 1594 vpaths V-Ray GPU RTX — 2613 vrays |
LuxMark | Lux ball — 74 795 Hotel — 15 794 Mic — 45 640 |
Lux ball — 78 554 Hotel — 16 219 Mic — 48 832 |
Blender | Monster — 2312 Junkshop — 1331 Classroom — 1148 |
Monster — 2468 Junkshop — 1388 Classroom — 1223 |
RTX A5500 shows better performance in rendering, but here everything depends on optimization: in V-Ray 5 we have a 13-14% gap, in LuxMark - 5-7%, with similar figures of 5-7% in Blender. Taking into account a margin of error of 1-2% percent depending on the run, the final performance gain is not very impressive.
The A5500 is at least 15% faster in machine learning, but for miners it will be an unpleasant surprise to find almost the same hash rate on both cards. Note, however, that this solution is positioned by the manufacturer for professionals in graphics and neural networks.
Concluions
Alas, a miracle did not happen, and the actual performance increase is 5-10% depending on the task, and in the cases of mining and encoding, there was nary an increase at all. On the plus side, we saw lower power consumption, better cooling due to the lower heat dissipation needs of the video chip, as well as the larger amount of video memory, which should have a positive effect on intensive tasks. Whether or not it is worth the money is up to the buyer, and you can order a dedicated server with an NVIDIA RTX A5500 from us if you wish to check it out yourself.