Author: Senior Devops. Hostkey Lead Infrastructure Specialist Nikita Zubarev
In previous articles, we talked about the ELK-RabbitMQ architecture and the Invapi service, which also uses a message broker to communicate with the backend. In any fault-tolerant architecture, proper monitoring with the right notifications is essential. In addition,you don’t only have to monitor the operation of the RabbitMQ cluster, but also to collect metrics and check the number of unread messages. This data can identify a failure in consumer operations in a timely manner, and deliver alerts to the user application that receives messages. Starting from version 3.8.0, RabbitMQ comes with built-in support for Prometheus and Grafana.
Support for the Prometheus metrics collector comes in the rabbitmq_prometheus plugin. The plugin provides all RabbitMQ metrics on a dedicated TCP port in Prometheus text format. To activate it on a cluster, run:
rabbitmq-plugins enable rabbitmq_prometheus
An open port will appear:
http/promethe: 15692
us:
Checking the metric:
Add configurations for Prometheus and Alertmanager:
- job_name: 'RABBIT MQ Prod NL'
static_configs:
- targets: ['rabbitnl-app01a.infra.hostkey.ru:15692','rabbitnl-app01b.infra.hostkey.ru:15692','rabbitnl-app01c.infra.hostkey.ru:15692']
The most important thing is the integrity of the cluster and the number of unread messages in the queue.
If there is more than one unread message in the queue, we send an alert:
- alert: rabbitmq_queue_messages
expr: rabbitmq_queue_messages{job="RABBIT MQ Dev"} > 1
for: 1m
labels:
severity: page
annotations:
summary: Critical rabbitmq_queue_messages
- alert: unacknowledged messages
expr: rabbitmq_queue_messages_unacked{job="RABBIT MQ Prod NL"} > 1
for: 1m
labels:
severity: page
annotations:
summary: Critical rabbitmq_queue_messages_unacked
Similarly, we set alerts for the integrity of the cluster.
As mentioned in our first article on monitoring, Grafana has the ability to import a dashboard, simply add id 10991.
Displayed indicators:
- Node identification, including RabbitMQ and Erlang/OTP versions.
- Host memory and disk are available until publishers are locked out (alarm triggers).
- Host file descriptors and TCP sockets are available.
- Ready and pending messages.
- Incoming message frequency: published / redirected to queues / acknowledged / not acknowledged / returned / discarded.
- Evaluation of outgoing messages: delivered with automatic or manual confirmation / acknowledged / redelivered.
- Polling operation with automatic or manual confirmation, as well as with empty operations.
- Queues, including add and delete rates.
- Channels, including opening and closing levels.
- Connections, including open and closed channels.
If needs be, further parameters can be added to the list (we will discuss how to create templates in the following articles).
Thus, RabbitMQ monitoring tools allow you to check the overall performance of the node, as well as ready and unacknowledged messages. An important advantage of our solution is the multifaceted and operational monitoring of your equipment conditions.