search cancel

Monitored disk not sending alarms

book

Article ID: 253454

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

A monitored disk that went to 4% and didn't send alarms while the threshold was 10% critical.

Environment

Release : 20.4

Cause

Network connectivity issue between the primary hub and the database server at the same time as the alarm policy

Resolution

Look at the policy_management_ws log file from Nimsoft\probes\service\wasp of the OC which is the package that Handel the alarm policies.

Working and not working examples as below:

1 - issue processing alarm policies 

2022-10-21 10:56:40.330 [   Timer-0]  ERROR service.HeartBeatService - Registration of this policy node to master failed, java.net.SocketTimeoutException: connect timed out
2022-10-21 10:56:40.330 [   Timer-0]  DEBUG service.HeartBeatService - Setting policy processing mode to false
2022-10-21 11:01:29.718 [   Timer-0]  ERROR config.PolicyManagementConfig - Failed to get DataEnginer address or Hubip
com.nimsoft.nimbus.NimStatusCodeException: Received status (4) on response (for sendRcv) for cmd = 'nametoip' name = '/CAUIM_domain/CAUIM_hub/CAUIM/controller'

2- Working alarm policies:

2022-10-23 05:29:52.334 [-utility-1]  INFO  config.PolicyManagementServletContextListener - Initialized  (./conf/policy_log4j2.xml).
2022-10-23 05:29:52.365 [-utility-1]  DEBUG config.PolicyManagementServletContextListener - Context Initialized

2022-10-23 05:30:19.414 [-utility-1]  INFO  service.HeartBeatService - This is the node running on the non-primary hub robot
2022-10-23 05:30:19.414 [-utility-1]  INFO  service.HeartBeatService - Primary wasp is at /CAUIM1v_domain/CAUIM1v_hub/cauim1v/wasp
2022-10-23 05:30:19.421 [-utility-1]  INFO  service.HeartBeatService - HA Master is running at http://10.100.56.39:8080/adminconsoleapp
2022-10-23 05:30:19.537 [-utility-1]  INFO  config.PolicyManagementConfig - Config file successfully loaded.
..