Health check of queues using the custom field-developed hubmon probe

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

Some DX UIM customers experience situations where the hub queues, e.g., alarm_enrichment, nas, baseline_engine, data_engine, etc., show as green but the underlying processing has stopped.
Are there any recommendations, existing mechanisms, utilities, or approaches for checking the health of one or more queues to determine if processing of messages has really stopped or is still healthy and processing data?
How can we monitor alarm activity and ensure that alarms are being processed (Sent/Received).

Environment

Release: UIM 20.x or higher
hubmon 1.35 (Hub Queues and Statistics Monitor)

Cause

Guidance

Resolution

Important: The hubmon probe is a field-developed probe which is not officially supported.

The hubmon probe monitors all of your DX UIM hubs and their queues from a single hubmon probe (e.g., on the Primary hub). It creates QoS metrics and queue-size alarms.

Each hub probe is monitored including:

QOS_PROBE_AVAILABILITY – hub responds to callback requests
QOS_PROBE_BUILD – hub probe build
QOS_PROBE_LOGLEVEL – hub probe loglevel
QOS_PROBE_UPTIME – hub probe uptime in seconds
QOS_PROBE_VERSION – hub probe version

Total messages are monitored to determine which hubs have the most throughput:

QOS_HUB_MESSAGES_RECEIVED – total messages received by each hub
QOS_HUB_MESSAGES_SENT – total messages send by each hub

Each queue on every hub is monitored for:

QOS_HUB_QUEUE_SIZE – the number of messages queued
QOS_HUB_QUEUE_CONNECTED – does the queue have a connection and is it being read (Connected, Not Connected)
QOS_HUB_QUEUE_BULKSIZE – the number of messages read at a time
QOS_HUB_QUEUE_COUNT – number of messages processed while connected
QOS_HUB_QUEUE_RATE – how many messages are being processed per second

Hub queue size alarming is 'built-in' with minor, major & critical thresholds.

Hence, if a queue size exceeds these values, alarms will be created and sent and therefore you can use the nas Auto Operator to send EMAILs when a given alarm occurs.

Additional Information

Via Raw configure mode, hubmon allows you to configure 3 alarm severity (minor, major and critical), and threshold values to generate alarms, e.g.,

The hubmon probe is attached to this KB Article.

Screenshot of a hubmon report/dashboard is shown below in UMP (prior to the new OPerator Console).

This type of view can be recreated in an Operator Console dashboard or using the List Designer/List Viewer as of CU6 or higher. (CU7 recommended), by leveraging the QOS data mentioned above, being collected.

hub.cfg sample:

<setup>
threads = 5
loglevel = 0
logfile = hubmon.log
logsize = 10000
threshold_minor = 10000
threshold_major = 50000
threshold_critical = 1000000
frequency = 60
hubscan_frequency = 3600
send_alarms = true
</setup>
<startup>
<opt>
java_mem_init = -Xms256m
java_mem_max = -Xmx1024m
</opt>
</startup>

Attachments

hubmon_1674485227834.zip get_app