In our second article about encoding (with the A4000 test) we missed the fact that a video stream can be of higher resolution, so it's worth testing 4K file encoding. To complete the picture, we will also compare encoding on solutions from NVIDIA with Intel's built-in GPU. Some professionals believe that it is enough to use the same FFmpeg with QuickSync enabled and an external video card will no longer be needed. We will check this assertion as well.
A4000 vs A5000
We will use the same test rig from the existing HOSTKEY servers, but install an NVIDIA A5000 graphics card with more encoding blocks, 24 GB of video memory and higher power consumption.
First, let's check its performance based on the number of threads, which turned out to be the limit for the A4000 according to the results of the previous test:
14 threads
gpu | pwr | gtemp | mtemp | sm | mem | enc | dec | mclk | pclk | fb | bar1 |
---|---|---|---|---|---|---|---|---|---|---|---|
Idx | W | C | C | % | % | % | % | MHz | MHz | MB | MB |
0 | 97 | 47 | - | 92 | 3 | 100 | 0 | 7600 | 1920 | 3502 | 33 |
gpu | Idx | 0 |
pwr | W | 97 |
gtemp | C | 47 |
mtemp | C | - |
sm | % | 92 |
mem | % | 3 |
enc | % | 100 |
dec | % | 0 |
mclk | MHz | 7600 |
pclk | MHz | 1920 |
fb | MB | 3502 |
bar1 | MB | 33 |
frame=1015 fps=31 q=28.0 Lsize= 9056kB time=00:00:33.80 bitrate=2194.8kbits/s speed=1.02x
Amazing! We got figures comparable to those of the A4000. Despite higher chip frequency, more video memory and higher power consumption, the A5000 managed to encode only 14 threads and gave up on the fifteenth. This fiasco proves once again that professional video adapters are designed for other purposes.
Switching to 4K
Now let's try broadcasting the stream with 3840x2160 resolution (aka 4K). Thankfully we have such a video file about a rabbit. CPU-only encoding failed even on one thread, when the amount of data multiplied:
frame= 2902 fps=27 q=29.0 size=104448kB time=00:01:33.56 bitrate=9144.7kbits/s dup=436 drop=0 speed=0.878x
What are the capabilities of the GPU (remember, the results of the A4000 and A5000 are comparable)? It's 3 threads.
gpu | pwr | gtemp | mtemp | sm | mem | enc | dec | mclk | pclk | fb | bar1 |
---|---|---|---|---|---|---|---|---|---|---|---|
Idx | W | C | C | % | % | % | % | MHz | MHz | MB | MB |
0 | 96 | 46 | - | 100 | 3 | 96 | 0 | 7600 | 1920 | 1112 | 9 |
gpu | Idx | 0 |
pwr | W | 96 |
gtemp | C | 46 |
mtemp | C | - |
sm | % | 100 |
mem | % | 3 |
enc | % | 96 |
dec | % | 0 |
mclk | MHz | 7600 |
pclk | MHz | 1920 |
fb | MB | 1112 |
bar1 | MB | 9 |
As you can see, in terms of power consumption and encoding blocks, the video chip is obviously not working in high-comfort mode, although only about 1 GB of video memory is being used.
FFmpeg output confirms that the video card is doing fine:
frame= 1465 fps=33 q=35.0 Lsize=12584kB time=00:00:48.80 bitrate=2112.4kbits/s dup=159 drop=0 speed=1.09x
However, the adapter can't handle 4 streams. Although the hardware load remains at about the same values, there is a drop in frames:
frame= 614 fps= 26 q=35.0 Lsize=4978kB time=00:00:20.43 bitrate=1995.6kbits/s speed=0.858x
Using FFmpeg with QuickSync support
For the tests we needed a suitable Intel processor (we found a machine with a Core i9-9900K CPU @ 3.60GHz) and FFmpeg with Quick Sync support. There were no problems with the former (we only needed a 6th-generation or older chip and a GPU, which is easy to check), but setting up FFmpeg for Ubuntu 20.04 felt like practicing the Kama Sutra. To save you precious time, we will describe how we solved the problem.
Since the packages in the repositories are broken, the first thing to do is to build and install gmmlib and libva libraries, as well as the latest Intel media driver and Media SDK versions in the system. To do that, create a GIT folder in your home directory, go to it and run the following commands in sequence (if any dependencies are missing, install them from the repository; we recommend doing sudo apt install autoconf automake build-essential cmake pkg-config):
git clone https://github.com/intel/gmmlib.git && cd gmmlib
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib/x86_64-linux-gnu ..
make -j8
sudo make install
git clone https://github.com/intel/libva.git && cd libva
./autogen.sh --prefix=/usr --libdir=/usr/lib/x86_64-linux-gnu
make -j8
sudo make install
git clone https://github.com/intel/media-driver.git && cd media-driver
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib/x86_64-linux-gnu ..
make -j8
sudo make install
git clone https://github.com/Intel-Media-SDK/MediaSDK.git && cd MediaSDK
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib/x86_64-linux-gnu ..
make -j8
sudo make install
Then you need to run FFmpeg with a few magic commands:
git clone https://github.com/ffmpeg/ffmpeg
cd ffmpeg
./configure --enable-libmfx --enable-vaapi --enable-opencl --enable-libvorbis --enable-libvpx --enable-libdrm --enable-gpl --cpu=native --enable-libfdk-aac --enable-libx264 --enable-libx265 --extra-libs=-lpthread --enable-nonfree
make -j8
sudo make install
It's worth making sure you have Quick Sync support:
ffmpeg -decoders|grep qsv
The output of the command should look something like this:
V....D av1_qsv AV1 video (Intel Quick Sync Video acceleration) (codec av1)
V....D h264_qsv H264 video (Intel Quick Sync Video acceleration) (codec h264)
V....D hevc_qsv HEVC video (Intel Quick Sync Video acceleration) (codec hevc)
V....D mjpeg_qsv MJPEG video (Intel Quick Sync Video acceleration) (codec mjpeg)
V....D mpeg2_qsv MPEG2VIDEO video (Intel Quick Sync Video acceleration) (codec mpeg2video)
V....D vc1_qsv VC1 video (Intel Quick Sync Video acceleration) (codec vc1)
V....D vp8_qsv VP8 video (Intel Quick Sync Video acceleration) (codec vp8)
V....D vp9_qsv VP9 video (Intel Quick Sync Video acceleration) (codec vp9)
Good! Everything is ready for testing.
Testing encoding with Quick Sync
First, let's see how the processor can handle FullHD video encoding without Quick Sync: it can manage 4 threads maximum, with all cores under 100% load.
frame= 1461 fps= 33 q=29.0 size=24064kB time=00:00:46.33 bitrate=4254.7kbits/s speed=1.05x
The fifth thread is no longer being handled by the processor, so we can safely proceed with the Quick Sync test. In the script from the previous article, you will need to change the encoder to h264_qsv, and it will look like this (you can read more about using QuickSync with FFmpeg here):
#!/bin/bash
for (( i=0; i<$1; i++ )) do
ffmpeg -i http://78.0.75.110:5454/ -an -vcodec h264_qsv -y Output-File-$i.mp4 &
done
First we do a test on 6 threads (+2 to the test on a clean CPU):
frame=291 fps=55 q=29.0 size=1280kB time=00:00:10.13 bitrate=1034.8kbits/s dup=2 drop=0 speed=1.93x
The difference is obvious: the CPU load is less than 50%, and the available reserve of computing resources allows you to predict 11 - 12 total threads.
Let's try 11 threads:
frame=157 fps=30 q=38.0 Lsize=628kB time=00:00:05.69 bitrate=903.0kbits/s dup=2 drop=0 speed=1.09x
The processor load increases only slightly, but the GPU is already reaching its limits. The twelfth thread drops the bitrate and processing speed to 24 - 28 frames.
Now let's check the threads in 4K. In contrast to AMD, our Intel processor easily handles one thread at this resolution and without hardware acceleration:
frame=655 fps=31 q=-1.0 Lsize=30637kB time=00:00:21.73 bitrate=11547.9kbits/s speed=1.03x
Unfortunately, it couldn’t do more than that. With Quick Sync on, the test computer was able to pull three 4K threads:
frame= 509 fps=31 q=33.0 Lsize=8010kB time=00:00:17.42 bitrate=3764.7kbits/s dup=2 drop=0 speed=1.07x
It failed only on the fourth, but our Nvidia A5000 video card survived as well.
Bottom line
You can draw your own conclusions. We would only mention that for video encoding, the difference in video card capabilities is not always determined by the price, and for some tasks it is worth paying attention to specialized technologies in CPUs. We also used H264 for the tests, but HEVC (H265) or VP1 codecs should in theory give better results, especially at 4K resolutions. If you do similar tests with the former yourself (VP1 is still available on hardware and on a large scale only for decoding), share your results in the comments.