EN
Currency:
EUR – €
Choose a currency
  • Euro EUR – €
  • United States dollar USD – $
VAT:
OT 0%
Choose your country (VAT)
  • OT All others 0%
Choose a language
  • Choose a currency
    Choose you country (VAT)
    Dedicated Servers
  • Instant
  • Custom
  • Single CPU servers
  • Dual CPU servers
  • Servers with 4th Gen CPUs
  • Servers with AMD Ryzen and Intel Core i9
  • Storage Servers
  • Servers with 10Gbps ports
  • Hosting virtualization nodes
  • GPU
  • Sale
  • VPS
  • General VPS
  • Performance VPS
  • Edge VPS
  • Storage VPS
  • VDS
  • Ryzen VPS
  • GPU
  • Dedicated GPU server
  • VM with GPU
  • Tesla A100 80GB & H100 Servers
  • Sale
    Apps
    Cloud
  • VMware and RedHat's oVirt Сlusters
  • Proxmox VE
  • Colocation
  • Colocation in the Netherlands
  • Remote smart hands
  • Services
  • L3-L4 DDoS Protection
  • Network equipment
  • IPv4 and IPv6 address
  • Managed servers
  • SLA packages for technical support
  • Monitoring
  • Software
  • VLAN
  • Announcing your IP or AS (BYOIP)
  • USB flash/key/flash drive
  • Traffic
  • Hardware delivery for EU data centers
  • AI Chatbot Lite
  • About
  • Careers at HOSTKEY
  • Server Control Panel & API
  • Data Centers
  • Network
  • Speed test
  • Hot deals
  • Sales contact
  • Reseller program
  • Affiliate Program
  • Grants for winners
  • Grants for scientific projects and startups
  • News
  • Our blog
  • Payment terms and methods
  • Legal
  • Abuse
  • Looking Glass
  • The KYC Verification
  • Hot Deals

    20.06.2022

    Multithreaded encoding: Pay twice as much or go for built-in?

    server one

    Our test of the NVIDIA A4000 practically confirmed that it is able to encode up to 16 independent FullHD video streams in H264 format. Will we be able to multiply the performance with a professional video card, which costs twice as much? Let's check it out.

    HOSTKEY
    Rent GPU servers with instant deployment or a server with a custom configuration with professional-grade NVIDIA RTX 5500 / 5000 / A4000 cards. VPS with dedicated GPU cards are also available . The GPU card is dedicated to the VM and cannot be used by other clients. GPU performance in virtual machines matches GPU performance in dedicated servers.

    In our second article about encoding (with the A4000 test) we missed the fact that a video stream can be of higher resolution, so it's worth testing 4K file encoding. To complete the picture, we will also compare encoding on solutions from NVIDIA with Intel's built-in GPU. Some professionals believe that it is enough to use the same FFmpeg with QuickSync enabled and an external video card will no longer be needed. We will check this assertion as well.

    We won't describe in detail the testing process for NVIDIA video cards and why we need FFmpeg, as this is covered in the previous articles (parts one and two). We'd rather focus on the new results and useful tips and tricks.

    A4000 vs A5000

    We will use the same test rig from the existing HOSTKEY servers, but install an NVIDIA A5000 graphics card with more encoding blocks, 24 GB of video memory and higher power consumption.

    NVIDIA A5000

    First, let's check its performance based on the number of threads, which turned out to be the limit for the A4000 according to the results of the previous test:

    14 threads

    gpu pwr gtemp mtemp sm mem enc dec mclk pclk fb bar1
    Idx W C C % % % % MHz MHz MB MB
    0 97 47 - 92 3 100 0 7600 1920 3502 33
    gpu Idx 0
    pwr W 97
    gtemp C 47
    mtemp C -
    sm % 92
    mem % 3
    enc % 100
    dec % 0
    mclk MHz 7600
    pclk MHz 1920
    fb MB 3502
    bar1 MB 33

    frame=1015 fps=31 q=28.0 Lsize= 9056kB time=00:00:33.80 bitrate=2194.8kbits/s speed=1.02x

    Amazing! We got figures comparable to those of the A4000. Despite higher chip frequency, more video memory and higher power consumption, the A5000 managed to encode only 14 threads and gave up on the fifteenth. This fiasco proves once again that professional video adapters are designed for other purposes.

    Switching to 4K

    Now let's try broadcasting the stream with 3840x2160 resolution (aka 4K). Thankfully we have such a video file about a rabbit. CPU-only encoding failed even on one thread, when the amount of data multiplied:

    frame= 2902 fps=27 q=29.0 size=104448kB time=00:01:33.56 bitrate=9144.7kbits/s dup=436 drop=0 speed=0.878x

    What are the capabilities of the GPU (remember, the results of the A4000 and A5000 are comparable)? It's 3 threads.

    gpu pwr gtemp mtemp sm mem enc dec mclk pclk fb bar1
    Idx W C C % % % % MHz MHz MB MB
    0 96 46 - 100 3 96 0 7600 1920 1112 9
    gpu Idx 0
    pwr W 96
    gtemp C 46
    mtemp C -
    sm % 100
    mem % 3
    enc % 96
    dec % 0
    mclk MHz 7600
    pclk MHz 1920
    fb MB 1112
    bar1 MB 9

    As you can see, in terms of power consumption and encoding blocks, the video chip is obviously not working in high-comfort mode, although only about 1 GB of video memory is being used.

    FFmpeg output confirms that the video card is doing fine:

    frame= 1465 fps=33 q=35.0 Lsize=12584kB time=00:00:48.80 bitrate=2112.4kbits/s dup=159 drop=0 speed=1.09x

    However, the adapter can't handle 4 streams. Although the hardware load remains at about the same values, there is a drop in frames:

    frame= 614 fps= 26 q=35.0 Lsize=4978kB time=00:00:20.43 bitrate=1995.6kbits/s speed=0.858x

    Using FFmpeg with QuickSync support

    According to the developer, QuickSync is supposed to "use the special multimedia processing capabilities of Intel® graphics technology to accelerate decoding and encoding, allowing the processor to perform other tasks in parallel and improving system performance."

    For the tests we needed a suitable Intel processor (we found a machine with a Core i9-9900K CPU @ 3.60GHz) and FFmpeg with Quick Sync support. There were no problems with the former (we only needed a 6th-generation or older chip and a GPU, which is easy to check), but setting up FFmpeg for Ubuntu 20.04 felt like practicing the Kama Sutra. To save you precious time, we will describe how we solved the problem.

    Since the packages in the repositories are broken, the first thing to do is to build and install gmmlib and libva libraries, as well as the latest Intel media driver and Media SDK versions in the system. To do that, create a GIT folder in your home directory, go to it and run the following commands in sequence (if any dependencies are missing, install them from the repository; we recommend doing sudo apt install autoconf automake build-essential cmake pkg-config):

    git clone https://github.com/intel/gmmlib.git && cd gmmlib 
    mkdir build && cd build
    cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib/x86_64-linux-gnu ..
    make -j8
    sudo make install 
    
    git clone https://github.com/intel/libva.git && cd libva
    ./autogen.sh --prefix=/usr --libdir=/usr/lib/x86_64-linux-gnu 
    make -j8
    sudo make install 
    
    git clone https://github.com/intel/media-driver.git && cd media-driver
    mkdir build && cd build
    cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib/x86_64-linux-gnu ..
    make -j8
    sudo make install 
    
    git clone https://github.com/Intel-Media-SDK/MediaSDK.git && cd MediaSDK
    mkdir build && cd build
    cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=/usr/lib/x86_64-linux-gnu ..
    make -j8
    sudo make install

    Then you need to run FFmpeg with a few magic commands:

    git clone https://github.com/ffmpeg/ffmpeg
    cd ffmpeg
    ./configure --enable-libmfx --enable-vaapi --enable-opencl --enable-libvorbis --enable-libvpx --enable-libdrm --enable-gpl --cpu=native --enable-libfdk-aac --enable-libx264 --enable-libx265 --extra-libs=-lpthread --enable-nonfree
    make -j8
    sudo make install

    It's worth making sure you have Quick Sync support:

    ffmpeg -decoders|grep qsv

    The output of the command should look something like this:

    V....D av1_qsv              AV1 video (Intel Quick Sync Video acceleration) (codec av1) 
    V....D h264_qsv             H264 video (Intel Quick Sync Video acceleration) (codec h264) 
    V....D hevc_qsv             HEVC video (Intel Quick Sync Video acceleration) (codec hevc) 
    V....D mjpeg_qsv            MJPEG video (Intel Quick Sync Video acceleration) (codec mjpeg) 
    V....D mpeg2_qsv            MPEG2VIDEO video (Intel Quick Sync Video acceleration) (codec mpeg2video) 
    V....D vc1_qsv              VC1 video (Intel Quick Sync Video acceleration) (codec vc1) 
    V....D vp8_qsv              VP8 video (Intel Quick Sync Video acceleration) (codec vp8) 
    V....D vp9_qsv              VP9 video (Intel Quick Sync Video acceleration) (codec vp9) 
    

    Good! Everything is ready for testing.

    Testing encoding with Quick Sync

    First, let's see how the processor can handle FullHD video encoding without Quick Sync: it can manage 4 threads maximum, with all cores under 100% load.

    frame= 1461 fps= 33 q=29.0 size=24064kB time=00:00:46.33 bitrate=4254.7kbits/s speed=1.05x

    The fifth thread is no longer being handled by the processor, so we can safely proceed with the Quick Sync test. In the script from the previous article, you will need to change the encoder to h264_qsv, and it will look like this (you can read more about using QuickSync with FFmpeg here):

    #!/bin/bash
    for (( i=0; i<$1; i++ )) do
    	ffmpeg -i http://78.0.75.110:5454/ -an -vcodec h264_qsv -y Output-File-$i.mp4 &
    done

    First we do a test on 6 threads (+2 to the test on a clean CPU):

    frame=291 fps=55 q=29.0 size=1280kB time=00:00:10.13 bitrate=1034.8kbits/s dup=2 drop=0 speed=1.93x

    The difference is obvious: the CPU load is less than 50%, and the available reserve of computing resources allows you to predict 11 - 12 total threads.

    Let's try 11 threads:

    frame=157 fps=30 q=38.0 Lsize=628kB time=00:00:05.69 bitrate=903.0kbits/s dup=2 drop=0 speed=1.09x

    The processor load increases only slightly, but the GPU is already reaching its limits. The twelfth thread drops the bitrate and processing speed to 24 - 28 frames.

    Now let's check the threads in 4K. In contrast to AMD, our Intel processor easily handles one thread at this resolution and without hardware acceleration:

    frame=655 fps=31 q=-1.0 Lsize=30637kB time=00:00:21.73 bitrate=11547.9kbits/s speed=1.03x

    Unfortunately, it couldn’t do more than that. With Quick Sync on, the test computer was able to pull three 4K threads:

    frame= 509 fps=31 q=33.0 Lsize=8010kB time=00:00:17.42 bitrate=3764.7kbits/s dup=2 drop=0 speed=1.07x

    It failed only on the fourth, but our Nvidia A5000 video card survived as well.

    Unfortunately, the solution has disadvantages as well. When using the BMC module (for example, when controlling a machine via IPMI), you will not have access to all the hardware acceleration capabilities, even if a GPU is detected in the system. You'll have to choose between the convenience of remote management or getting all the benefits of using Quick Sync.

    Bottom line

    You can draw your own conclusions. We would only mention that for video encoding, the difference in video card capabilities is not always determined by the price, and for some tasks it is worth paying attention to specialized technologies in CPUs. We also used H264 for the tests, but HEVC (H265) or VP1 codecs should in theory give better results, especially at 4K resolutions. If you do similar tests with the former yourself (VP1 is still available on hardware and on a large scale only for decoding), share your results in the comments.

    Rent GPU servers with instant deployment or a server with a custom configuration with professional-grade NVIDIA RTX 5500 / 5000 / A4000 cards. VPS with dedicated GPU cards are also available . The GPU card is dedicated to the VM and cannot be used by other clients. GPU performance in virtual machines matches GPU performance in dedicated servers.

    Other articles

    28.11.2024

    OpenWebUI Just Got an Upgrade: What's New in Version 0.4.5?

    OpenWebUI has been updated to version 0.4.5! New features for RAG, user groups, authentication, improved performance, and more. Learn how to upgrade and maximize its potential.

    25.11.2024

    How We Replaced the IPMI Console with HTML5 for Managing Our Servers

    Tired of outdated server management tools? See how we replaced the IPMI console with an HTML5-based system, making remote server access seamless and efficient for all users.

    25.10.2024

    TS3 Manager: What Happens When You Fill in the Documentation Gaps

    Having trouble connecting to TS3 Manager after installing it on your VPS? Managing your TeamSpeak server through TS3 Manager isn't as straightforward as it might seem. Let's troubleshoot these issues together!

    16.09.2024

    10 Tips for Open WebUI to Enhance Your Work with AI

    Unleash the true power of Open WebUI and transform your AI workflow with these 10 indispensable tips.

    27.08.2024

    Comparison of SaaS solutions for online store on Wix and WordPress.com versus an on-premise solution on a VPS with WordPress and WooCommerce

    This article compares the simplicity and cost of SaaS platforms like Wix and WordPress.com versus the flexibility and control of a VPS with WordPress and WooCommerce for e-commerce businesses.

    HOSTKEY Dedicated servers and cloud solutions Pre-configured and custom dedicated servers. AMD, Intel, GPU cards, Free DDoS protection amd 1Gbps unmetered port 30
    4.3 67 67
    Upload