EN
Currency:
EUR – €
Choose a currency
  • Euro EUR – €
  • United States dollar USD – $
VAT:
OT 0%
Choose your country (VAT)
  • OT All others 0%
Choose a language
  • Choose a currency
    Choose you country (VAT)
    Dedicated Servers
  • Instant
  • Custom
  • Single CPU servers
  • Dual CPU servers
  • Servers with 4th Gen CPUs
  • Servers with AMD Ryzen and Intel Core i9
  • Storage Servers
  • Servers with 10Gbps ports
  • Hosting virtualization nodes
  • GPU
  • Sale
  • VPS
  • General VPS
  • Performance VPS
  • Edge VPS
  • Storage VPS
  • VDS
  • Ryzen VPS
  • GPU
  • Dedicated GPU server
  • VM with GPU
  • Tesla A100 80GB & H100 Servers
  • Sale
    Apps
    Cloud
  • VMware and RedHat's oVirt Сlusters
  • Proxmox VE
  • Colocation
  • Colocation in the Netherlands
  • Remote smart hands
  • Services
  • L3-L4 DDoS Protection
  • Network equipment
  • IPv4 and IPv6 address
  • Managed servers
  • SLA packages for technical support
  • Monitoring
  • Software
  • VLAN
  • Announcing your IP or AS (BYOIP)
  • USB flash/key/flash drive
  • Traffic
  • Hardware delivery for EU data centers
  • AI Chatbot Lite
  • About
  • Careers at HOSTKEY
  • Server Control Panel & API
  • Data Centers
  • Network
  • Speed test
  • Hot deals
  • Sales contact
  • Reseller program
  • Affiliate Program
  • Grants for winners
  • Grants for scientific projects and startups
  • News
  • Our blog
  • Payment terms and methods
  • Legal
  • Abuse
  • Looking Glass
  • The KYC Verification
  • Hot Deals

    09.10.2022

    ELK-RabbitMQ Architecture - log management for a large IT infrastructure.

    server one

    While managing a large infrastructure of a hosting company, loss of logs can incur reputational and financial damage. In general, for most equipment owners, there will be no desire to create a system where the logs will be duplicated and transformed into a large and difficult-to-control array of data.

    HOSTKEY
    Rent dedicated and virtual servers with instant deployment in reliable TIER III class data centers in the Netherlands and the USA. Free protection against DDoS attacks included, and your server will be ready for work in as little as 15 minutes. 24/7 Customer Support.

    Hostkey has decided not to reinvent the wheel but rather to build our system based on Open Distro. In this article, we will discuss an architecture-based solution made possible thanks to the advantageous decision of the AMQP RabbitMQ broker to emphasize fault tolerance and minimal consumption of log data during failures.

    In short, the system for processing and storing logs is performed broadly according to the following principle:

    The log collection application sends them directly via the SYSLOG protocol to the Logstash data processing pipeline. From there, the data enters the Elasticsearch database and is then used by the Kibana data visualization panel.

    All ELK elements support fault tolerance, and the direct interaction scheme is quite stable. However, with the growth of the system, it is necessary to build a linear increase in the power of the log processing system, since Logstash is the point for receiving logs and at the same time a system for parsing them. As the number of messages grows, a linear increase in system performance is required; that is, the receiver should not become “choked” with messages - on the contrary, it should reliably provide information about failures without losing messages. To this end, it is necessary to allocate resources for the expected surges in advance.

    In one of our previous articles, we talked about how hardware management works at Hostkey and what role the message broker cluster (RabbitMQ) plays in our infrastructure. A feature of RabbitMQ is high reliability and the ability to process large volumes of messages with minimal resource consumption. This is possible because Rabbit does not process messages, but only stores and delivers them.

    We decided to refine the processing system and add RabbitMQ as a queue buffer.

    This solution allowed us to increase the system fault tolerance: if something goes critically wrong during the processing and storage of logs, they are returned to Rabbit AMQP. Another advantage of our solution is the additional flexibility in working with logs - for example, we can include additional analyzers and perform analogical actions to manage actions and process logs.

    * We use Nginx to proxy Kibana

    As a result, Logstash pulls information from output queues as it allows you to process messages. Rabbit plays the role of a cache here, it is able to accept many times more information than Logstash.

    In case of emergency, you can also view messages through Rabbit. Furthermore, Rabbit also monitors the accumulation of messages in queues, enabling the direct identification of significant (surge in the number of logs) or release (failure of the ELK system) problems in the infrastructure.

    Logstash client agent configuration:

    destination d_ampqp {
    amqp(
    vhost("/")
    host("HOST_IP")
    port(5672)
    exchange("****")
    exchange-type("direct")
    routing-key("****")
    body("$(format-json —-scope nv_pairs —-pair category=\"infra.ru\" —-pair
    source=$source —-pair customendpoint=\" \" —-pair tags=\"infra.ru\")")
    persistent(yes)
    username("")
    password("")
    }
    }

    Logstash indexer agent configuration:

    input {
    rabbitmq {
    host => "HOST_IP"
    queue => "syslog"
    heartbeat => 30
    durable => true
    password => "****"
    user => "****"
    }
    }
    
    
    filter {
    # your filters here
    }
    
    output {
    elasticsearch { hosts => ["https://localhost:9200"]
    hosts => "https://localhost:9200"
    manage_template => false
    user => "admin"
    password => "****"
    ssl => false
    ssl_certificate_verification => false
    index => "%{[@metadata][syslog]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
    }
    }

    System Clients

    Having gone over the general idea, let's now move on to its implementation. Firstly, we split the client log collection system into trusted and untrusted. Trusted applications / hosts have the ability to log in and write to the message queue (directly or through an intermediary, which we will talk about a little later). Untrusted applications are those that do not have direct access to RabbitMQ.

    Let's explain with an example:

    • A trusted client is a company's standard infrastructure host.
    • An untrusted client is a LiveCD that installs the OS on the client host (the client can potentially interfere with the process and gain access to the RabbitMQ infrastructure).

    Onwards, it was necessary to resolve the issue of a transparent (for applications) transition from the syslog protocol to AMQP. syslog-ng is great for converting syslog traffic to AMQP. Our main operating systems used in the infrastructure are CentOS7 and 8. For the seventh version of the OS, we need to build a package with an updated version of the server since the version from the regular repository does not support AMQP. There is no such problem with 8. Next, on infrastructure (trusted) installations, one must replace the standard rsyslog with syslog-ng and configure its interaction with the Rabbit cluster:

    destination d_ampqp {
     amqp(
     vhost("/")
     host("AMQP_HOST")
     port(5672)
     exchange("eventtask")
     exchange-type("direct")
     routing-key("eventtask")
     body("$(format-json —-scope core)")
     persistent(yes)
     username("RABBIT_USERNAME")
     password("RABBIT_PASSWORD")
     );
    };

    For untrusted hosts or hosts on which syslog-ng cannot be installed, an external syslog-ng receiver is required.

    We decided to place such a receiver directly in the RabbitMQ hosts to minimize unnecessary service traffic, but this is not a rule - a group of syslog-ng hosts may well work separate from the broker. We made the number of receivers equal to the number of hosts in the Rabbit cluster, and the choice of the host is carried out through DNS-RoundRobin. This scheme has the disadvantage of a possible loss of messages in case of failure of one of the hosts, but the problem is solved quite easily by moving the receiver hosts to a clustered IP via a pacemaker / corosync bundle (this is an easy-to-use mechanism that we can talk about in a separate article).

    Benefits of the final system:

    • Fault tolerance - the system is able to process large message flows without loss and withstands the failure of any individual host providing service.
    • Flexibility - system performance is easy to scale, and the system is able to receive messages from any syslog client: from advanced (syslog-ng) to the most conservative (Microsoft, etc).
    • Profitability - the system allows you to avoid dedicating excess resources to processing logs. A "flurry" of messages does not lead to their loss.
    Rent dedicated and virtual servers with instant deployment in reliable TIER III class data centers in the Netherlands and the USA. Free protection against DDoS attacks included, and your server will be ready for work in as little as 15 minutes. 24/7 Customer Support.

    Other articles

    28.11.2024

    OpenWebUI Just Got an Upgrade: What's New in Version 0.4.5?

    OpenWebUI has been updated to version 0.4.5! New features for RAG, user groups, authentication, improved performance, and more. Learn how to upgrade and maximize its potential.

    25.11.2024

    How We Replaced the IPMI Console with HTML5 for Managing Our Servers

    Tired of outdated server management tools? See how we replaced the IPMI console with an HTML5-based system, making remote server access seamless and efficient for all users.

    25.10.2024

    TS3 Manager: What Happens When You Fill in the Documentation Gaps

    Having trouble connecting to TS3 Manager after installing it on your VPS? Managing your TeamSpeak server through TS3 Manager isn't as straightforward as it might seem. Let's troubleshoot these issues together!

    16.09.2024

    10 Tips for Open WebUI to Enhance Your Work with AI

    Unleash the true power of Open WebUI and transform your AI workflow with these 10 indispensable tips.

    27.08.2024

    Comparison of SaaS solutions for online store on Wix and WordPress.com versus an on-premise solution on a VPS with WordPress and WooCommerce

    This article compares the simplicity and cost of SaaS platforms like Wix and WordPress.com versus the flexibility and control of a VPS with WordPress and WooCommerce for e-commerce businesses.

    HOSTKEY Dedicated servers and cloud solutions Pre-configured and custom dedicated servers. AMD, Intel, GPU cards, Free DDoS protection amd 1Gbps unmetered port 30
    4.3 67 67
    Upload