Dedicated Servers
  • Instant
  • Custom
  • Single CPU servers
  • Dual CPU servers
  • Servers with 4th Gen CPUs
  • Servers with AMD Ryzen and Intel Core i9
  • Storage Servers
  • Servers with 10Gbps ports
  • Hosting virtualization nodes
  • GPU
  • Sale
  • VPS
    GPU
  • Dedicated GPU server
  • VM with GPU
  • Tesla A100 80GB & H100 Servers
  • Sale
    Apps
    Cloud
  • VMware and RedHat's oVirt Сlusters
  • Proxmox VE
  • Colocation
  • Colocation in the Netherlands
  • Remote smart hands
  • Services
  • Intelligent DDoS protection
  • Network equipment
  • IPv4 and IPv6 address
  • Managed servers
  • SLA packages for technical support
  • Monitoring
  • Software
  • VLAN
  • Announcing your IP or AS (BYOIP)
  • USB flash/key/flash drive
  • Traffic
  • Hardware delivery for EU data centers
  • About
  • Careers at HOSTKEY
  • Server Control Panel & API
  • Data Centers
  • Network
  • Speed test
  • Hot deals
  • Sales contact
  • Reseller program
  • Affiliate Program
  • Grants for winners
  • Grants for scientific projects and startups
  • News
  • Our blog
  • Payment terms and methods
  • Legal
  • Abuse
  • Looking Glass
  • The KYC Verification
  • Hot Deals

    03.03.2022

    Testing multi-threaded video distribution on gaming GPUs

    server one
    HOSTKEY B.V.



    When working with streaming video, the quality and speed of playback are key. Is it possible to set up multi-stream broadcasting without buying expensive hardware? Let's see what we can do.

    Problem.

    High-quality video broadcasting usually incurs serious costs: you need to allocate premises and create an engineering infrastructure for it, purchase equipment and hire employees to maintain it, rent data transmission channels and generally do all sorts of support work. Depending on the scale of the project, the capital investment alone may require significant investment.

    Custom and instant GPU servers equipped with professional-grade NVIDIA RTX 4000 / 5000 / A6000 cards

    What alternative?

    It is possible to significantly reduce capital costs and make operating costs by renting cloud servers with a GPU, and it is worth betting on hardware transcoding with Nvidia NVENC. In addition to reducing costs, it will make live streaming much easier.

    So, if you are setting up a stream, you should try FFmpeg.

    We use this free and open source set of libraries for automated video card testing. Implementing and maintaining solutions employing on this library is quite simple, and furthermore, they are distinguished by their high speed in encoding and decoding streams. This speed boost is achieved by not copying the encoded files into the system memory, but rather the encoding process is carried out using the memory of the graphics chip.

    Scheme of the transcoding process using FFmpeg:

    scheme-1
    Rent off-the-shelf GPU servers with instant deployment or a server with a custom configuration with professional-grade NVIDIA RTX 4000 / 5000 / A6000 cards. These solutions are ideal for remote access to high-load applications from any place on Earth.

    Driver patch and FFmpeg build

    We will be testing in Ubuntu Linux, and we will start with gaming graphics accelerators: the GeForce GTX 1080 Ti and GeForce RTX 3090. They are not being used in real projects, but they are quite capable of demonstrating the difference between transcoding using a CPU alone versus GPUs. The manufacturer does not consider these adapters "qualified" and limits the maximum number of simultaneous NVENC video transcoding sessions. To solve this problem, you will have to use a trick and disable the restriction using a patch for the video driver posted by enthusiasts on GitHub.

    The patch will not be required for professional graphics cards such as the RTX A4000 or A5000, since there is no hard limit on the number of threads embedded in their driver. A list of Nvidia graphics cards with NVENC support is available on the manufacturer’s website. The technology can be used as an NVENC SDK.

    You also need to build FFmpeg with Nvidia GPU support. We haven't released it to the repository yet, so here are detailed instructions for Ubuntu (in other Linux distributions, the procedure is similar):

    # Compiling for Linux
    # FFmpeg with NVIDIA GPU acceleration is supported on all Linux platforms.
    # To compile FFmpeg on Linux, do the following:
    # Clone ffnvcodec
    git clone https://git.videolan.org/git/ffmpeg/nv-codec-headers.git
    # Install ffnvcodec
    cd nv-codec-headers && sudo make install && cd –
    # Clone FFmpeg's public GIT repository.
    git clone https://git.ffmpeg.org/ffmpeg.git ffmpeg/
    # Install necessary packages.
    sudo apt-get install build-essential yasm cmake libtool libc6 libc6-dev unzip wget libnuma1 libnuma-dev
    # Configure
    ./configure --enable-nonfree --enable-cuda-nvcc --enable-libnpp --extra-cflags=-I/usr/local/cuda/include --extra-ldflags=-L/usr/local/cuda/lib64 --disable-static --enable-shared
    # Compile
    make -j 8
    # Install the libraries.
    sudo make install

    Stream Settings

    To stream video using FFmpeg, we need ffserver. Let’s edit the ffserver.conf file (the standard path to it is: /etc/ffserver.conf).

    Example of ffserver configurations for streaming:

    # Port that the server will use.
    HTTPPort 8090
    # Address, at which the server will work (0.0.0.0 — all available addresses).
    HTTPBindAddress 0.0.0.0
    # Maximum throughput per client in kb/s (up to 100000).
    MaxClients 1000
    RTSPPort 5454
    RTSPBindAddress 0.0.0.0
    <Stream name>
    Format rtp
    File /root/file.name.mp4
    ACL allow 0.0.0.0
    #VideoCodec libx264
    #VideoSize 1920X1080
    </Stream>

    Example of the command to start streaming:

    ffserver ffmpeg bbb_sunflower_1080p_30fps_normal.mp4 http://ip/feed.ffm

    Video streaming decoding example using GPU and NVENC decoder (connecting to a streaming video and saving it to a device):


    ffmpeg -i rtsp://ip:5454/nier -c:v h264_nvenc Output-File.mp4

    Sample output from nvidia-smi confirms that FFmpeg is using a GPU: 0 N/A N/A 27564 C ffmpeg 152MiB.

    Testing

    We conducted comparative testing of transcoding of Full HD (1080p) live streams in high profile H.264 on consumer video cards that had not undergone special training. The operation of the GeForce RTX 3090 was tested without removing the restrictions on the number of threads, as well as with a patched driver (for the GTX 1080 Ti, testing without a patch seemed redundant to us). One of the Blender demo files was chosen as the source video — bbb_sunflower_1080p_30fps_normal.mp4.

    To test the signal, an input stream with the following parameters was used:

    Video compression ?H.264
    Resolution 1920 x 1080 (in pixels)
    Frame rate 30 fps
    Video bitrate 2,996 Mbit/s
    Audio compression AAC
    Audio frequency 48 kHz
    No. of audio channels Stereo
    Audio bitrate 479 kbit/s
    Video compression H.264
    Resolution 1920 x 1080 (in pixels)
    Frame rate 30 fps
    Video bitrate 2,996 Mbit/s
    Audio compression AAC
    Audio frequency 48 kHz
    No. of audio channels Stereo
    Audio bitrate 479 kbit/s



    Full HD (1080p) is one of the most common live video streaming resolutions and allows for intensive computational loads during testing.

    Description of test conditions:

    CPU Test GeForce GTX 1080 Ti GeForce RTX 3090
    CPU 4 x VPS Core 4 x VPS Core 1 x Xeon E3-1230v6 3.5GHz (4 cores)
    RAM 1 x VPS RAM 16Gb 1 x VPS RAM 16Gb 2 x 16 Gb DDR4
    HDD 1 x VPS HDD 240 Gb 1 x VPS HDD 240 Gb 1 x 512Gb SSD
    1 x 120Gb SSD
    Other hardware 1 x VGPU 1080Ti 1 x VGPU 1080Ti 1 x RTX 3090
    CPU Test
    CPU 4 x VPS Core
    RAM 1 x VPS RAM 16Gb
    HDD 1 x VPS HDD 240 Gb
    Other hardware 1 x VGPU 1080Ti
    GeForce GTX 1080Ti
    CPU 4 x VPS Core
    RAM 1 x VPS RAM 16Gb
    HDD 1 x VPS HDD 240 Gb
    Other hardware 1 x VGPU 1080Ti
    GeForce RTX 3090
    CPU 1 x Xeon E3-1230v6 3.5GHz (4 cores)
    RAM 2 x 16 Gb DDR4
    HDD 1 x 512Gb SSD
    1 x 120Gb SSD
    Other hardware 1 x RTX 3090



    When testing, we registered the following loads:


    Fan Temp Perf Pwr:Usage/Cap Memory-Usage
    GeForce GTX 1080 Ti 59% 82C P2 86W / 250W 5493MiB/11178MiB
    GeForce RTX 3090 43% 51C P2 149W / 350W 22806MiB /24267MiB
    GeForce GTX 1080Ti
    Fan 59%
    Temp 82C
    Perf P2
    Pwr:Usage/Cap 86W / 250W
    Memory-Usage 5493MiB/11178MiB
    GeForce RTX 3090
    Fan 43%
    Temp 51C
    Perf P2
    Pwr:Usage/Cap 149W / 350W
    Memory-Usage 22806MiB /24267MiB

    Custom and instant GPU servers equipped with professional-grade NVIDIA RTX 4000 / 5000 / A6000 cards

    The CPU Test without using a GPU was successful, but it loaded the server to the maximum: all the computational cores and all available memory were involved, and at the output we got only a few threads. The high load on the processor precludes the effective use of this method for organizing a real broadcast due to the risk of critical errors and failures. The CPU alone is not suitable for a large number of parallel operations.

    When testing the GPU, one stream was fed to the input of the decoder, and the transcoded streams were distributed at the output via an rstp protocol. Note that the GeForce RTX 3090 without a driver patch only mastered three streams. When we tried to process more, we got errors:

    [h264_nvenc @ 0x55ddbdd3ef80] OpenEncodeSessionEx failed: out of memory (10): (no details)
    [h264_nvenc @ 0x55ddbdd3ef80] No capable devices found
    Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height

    The number of threads after applying the patch to the video driver and the amount of memory used are shown in the diagram:

    video card testing

    The number of threads processed by each card is limited by both the amount of GPU memory and RAM. The GeForce RTX 3090 modifications differ in the amount of video memory, but they process the same number of threads, which is determined by the test assembly - 32 GB of RAM. Below is an example of the output of data for the RAM from a test bench using a GeForce RTX 3090 video card:

      Total Used Free Shared Buff Cache available
    Mem 31 G 11 G 234 M 1,3 G 19 G 18 G
    Swap 4,0 G 1,0 M 4,0 G
    Mem
    Total 31 G
    Used 11 G
    Free 234 M
    Shared 1,3 G
    Buff 19 G
    Cache available 18 G
    Swap
    Total 31 G
    Used 11 G
    Free 234 M

    Conclusions

    Testing on consumer video adapters requires rough intervention in the system software, but even this shows that servers with GPUs allow you to transcode live streams using heavy loads.

    That is, it is quite possible to choose FFmpeg for high-quality broadcasting without buying commercial software and expensive workstations. For example, as a budget option for video surveillance tasks and saving streams from several dozen cameras to files: you can take a machine with one GeForce GTX 1080 Ti and write the streams from it to the NAS yourself.

    The solution also allows for broadcast scaling, as it does not require significant time and computing power to change the number of streams.

    Of course, outside your office, site or on the territory of the data center, due to the Nvidia licensing rules, you won’t be able to use gaming cards, and there’s no need: there are professional product lines for this. We will talk about experiments with them in the next part of the article.

    Rent off-the-shelf GPU servers with instant deployment or a server with a custom configuration with professional-grade NVIDIA RTX 4000 / 5000 / A6000 cards. These solutions are ideal for remote access to high-load applications from any place on Earth.

    Other articles

    17.04.2024

    How to choose the right server with suitable CPU/GPU for your AI?

    Let's talk about the most important components that influence the choice of server for artificial intelligence.

    04.04.2024

    VPS, Website Hosting or Website Builder? Where to host a website for business?

    We have compared website hosting options, including VPS, shared hosting, and website builders.

    15.03.2024

    How AI is fighting the monopoly in sports advertising with GPUs and servers

    AI and AR technologies allow sports advertising to be customized to different audiences in real time using cloud-based GPU solutions.

    06.03.2024

    From xWiki to static-HTML. How we “transferred” documentation

    Choosing a platform for creating a portal with internal and external documentation. Migration of documents from cWiki to Material for MkDocs

    05.02.2024

    Test Build: Supermicro X13SAE-F Intel Core i9-14900KF 6.0 GHz

    Test results of a computer assembly based on the Supermicro X13SAE-F motherboard and the new Intel Core i9-14900KF processor overclockable up to 6.0 GHz.

    HOSTKEY Dedicated servers and cloud solutions Pre-configured and custom dedicated servers. AMD, Intel, GPU cards, Free DDoS protection amd 1Gbps unmetered port 30
    4.3 67 67
    Upload