Alarms remain OPEN even after actual issues are resolved.
search cancel

Alarms remain OPEN even after actual issues are resolved.

book

Article ID: 418327

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Deleting the alarm fails by NullPointerException.
    var/log/messaging-manager/messaging-manager.log
    <TIMESTAMP>  INFO pool-<int>-thread-<int> NodeAphRealizer - SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="messaging"] handleDelete called for '<CLIENT_UUID>'.
    <TIMESTAMP>  INFO pool-<int>-thread-<int> AphClientManager - SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="messaging"] delete Client: client = <CLIENT_UUID>
    <TIMESTAMP>  INFO pool-<int>-thread-<int> ClientManager - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="messaging"] Clearing master cluster node id for client <CLIENT_UUID>.
    <TIMESTAMP>  INFO messaging-mgr-executor- AlarmDeleter - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="messaging"] AlarmDeleter invoked uuid <CLIENT_UUID>
    <TIMESTAMP>  WARN messaging-mgr-executor- AlarmManager - - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="messaging"] Exception while handling alarms based on client update.
    java.lang.NullPointerException: null
            at com.vmware.nsx.messaging.manager.service.alarm.AlarmManager.updateTask(AlarmManager.java:465) ~[libmessaging-manager.jar:?]
            at com.vmware.nsx.messaging.manager.service.alarm.AlarmManager.lambda$update$0(AlarmManager.java:459) ~[libmessaging-manager.jar:?]
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_382]
            at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_382]
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_382]
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_382]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_382]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_382]
            at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
  • Alarm was triggered. You may see following alarms:
    • Event Type: Management Channel To Manager Node Down Long (management_channel_to_manager_node_down_long)
      Description: Management channel to Manager Node '<MP_UUID>' ('<MP_IP_ADDRESS>') is down for 15 minutes.
      var/log/phonehome-coordinator/phonehome-coordinator.log
      <TIMESTAMP> FATAL pool-<int>-thread-<int> MonitoringServiceImpl 87327 MONITORING [nsx@6876 alarmId="<UUID>" alarmState="OPEN" comp="nsx-manager" entId="<TN_UUID>" errorCode="MP701099" eventFeatureName="communication" eventSev="CRITICAL" eventState="On" eventType="management_channel_to_manager_node_down_long" level="FATAL" nodeId="<TN_UUID>" subcomp="monitoring"] Management channel to Manager Node <MP_UUID>(<MP_IP_ADDRESS>) is down for 15 minutes.

Environment

NSX 4.1.x

Cause

Deleting client may fail to handle some messages due to an empty parameter.

Resolution

Upgrade to 4.2.

Resolve the alarm manually.

Additional Information

NSX Transport Node Network latency Alarm is not being cleared in NSX 4.1