Health check of queues using the custom field-developed hubmon probe
search cancel

Health check of queues using the custom field-developed hubmon probe

book

Article ID: 258300

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

  • Some DX UIM customers experience situations where the hub queues, e.g., alarm_enrichment, nas, baseline_engine, data_engine, etc., show as green but the underlying processing has stopped. 
  • Are there any recommendations, existing mechanisms, utilities, or approaches for checking the health of one or more queues to determine if processing of messages has really stopped or is still healthy and processing data?
  • How can we monitor alarm activity and ensure that alarms are being processed (Sent/Received).

Environment

  • Release: UIM 20.x or higher
  • hubmon 1.35 (Hub Queues and Statistics Monitor)

Cause

  • Guidance

Resolution

Important: The hubmon probe is a field-developed probe which is not officially supported. 

The hubmon probe monitors all of your DX UIM hubs and their queues from a single hubmon probe (e.g., on the Primary hub). It creates QoS metrics and queue-size alarms.

Each hub probe is monitored including:

  • QOS_PROBE_AVAILABILITY – hub responds to callback requests
  • QOS_PROBE_BUILD – hub probe build
  • QOS_PROBE_LOGLEVEL – hub probe loglevel
  • QOS_PROBE_UPTIME – hub probe uptime in seconds
  • QOS_PROBE_VERSION – hub probe version


Total messages are monitored to determine which hubs have the most throughput:

  • QOS_HUB_MESSAGES_RECEIVED – total messages received by each hub
  • QOS_HUB_MESSAGES_SENT – total messages send by each hub


Each queue
on every hub is monitored for:  

  • QOS_HUB_QUEUE_SIZE – the number of messages queued
  • QOS_HUB_QUEUE_CONNECTED – does the queue have a connection and is it being read (Connected, Not Connected)
  • QOS_HUB_QUEUE_BULKSIZE – the number of messages read at a time
  • QOS_HUB_QUEUE_COUNT – number of messages processed while connected
  • QOS_HUB_QUEUE_RATE – how many messages are being processed per second


Hub queue size alarming is 'built-in' with minor, major & critical thresholds.

Hence, if a queue size exceeds these values, alarms will be created and sent and therefore you can use the nas Auto Operator to send EMAILs when a given alarm occurs.

Additional Information

Via Raw configure mode, hubmon allows you to configure 3 alarm severity (minor, major and critical), and threshold values to generate alarms, e.g.,

The hubmon probe is attached to this KB Article.

Screenshot of a hubmon report/dashboard is shown below in UMP (prior to the new OPerator Console).

This type of view can be recreated in an Operator Console dashboard or using the List Designer/List Viewer as of CU6 or higher. (CU7 recommended), by leveraging the QOS data mentioned above, being collected.


hub.cfg sample:

<setup>
   threads = 5
   loglevel = 0
   logfile = hubmon.log
   logsize = 10000
   threshold_minor = 10000
   threshold_major = 50000
   threshold_critical = 1000000
   frequency = 60
   hubscan_frequency = 3600
   send_alarms = true
</setup>
<startup>
   <opt>
      java_mem_init = -Xms256m
      java_mem_max = -Xmx1024m
   </opt>
</startup>

Attachments

hubmon_1674485227834.zip get_app