Alarms remain OPEN even after actual issues are resolved.
search cancel

Alarms remain OPEN even after actual issues are resolved.

book

Article ID: 418327

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Auto deleting the alarm fails by NullPointerException which is logged in ESXi host /var/log/messaging-manager/messaging-manager.log

    <TIMESTAMP>  INFO pool-<int>-thread-<int> NodeAphRealizer - SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="messaging"] handleDelete called for '<CLIENT_UUID>'.
    <TIMESTAMP>  INFO pool-<int>-thread-<int> AphClientManager - SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="messaging"] delete Client: client = <CLIENT_UUID>
    <TIMESTAMP>  INFO pool-<int>-thread-<int> ClientManager - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="messaging"] Clearing master cluster node id for client <CLIENT_UUID>.
    <TIMESTAMP>  INFO messaging-mgr-executor- AlarmDeleter - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="messaging"] AlarmDeleter invoked uuid <CLIENT_UUID>
    <TIMESTAMP>  WARN messaging-mgr-executor- AlarmManager - - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="messaging"] Exception while handling alarms based on client update.
    java.lang.NullPointerException: null
            at com.vmware.nsx.messaging.manager.service.alarm.AlarmManager.updateTask(AlarmManager.java:465) ~[libmessaging-manager.jar:?]
            at com.vmware.nsx.messaging.manager.service.alarm.AlarmManager.lambda$update$0(AlarmManager.java:459) ~[libmessaging-manager.jar:?]
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_382]
            at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_382]
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_382]
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_382]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_382]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_382]
            at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]


  • Alarm was triggered. You may see following alarms:
    • Event Type: Management Channel To Manager Node Down Long (management_channel_to_manager_node_down_long)
      Description: Management channel to Manager Node '<MP_UUID>' ('<MP_IP_ADDRESS>') is down for 15 minutes.

      The following message can be found in NSX Manager /var/log/phonehome-coordinator/phonehome-coordinator.log
      <TIMESTAMP> FATAL pool-<int>-thread-<int> MonitoringServiceImpl 87327 MONITORING [nsx@6876 alarmId="<UUID>" alarmState="OPEN" comp="nsx-manager" entId="<TN_UUID>" errorCode="MP701099" eventFeatureName="communication" eventSev="CRITICAL" eventState="On" eventType="management_channel_to_manager_node_down_long" level="FATAL" nodeId="<TN_UUID>" subcomp="monitoring"] Management channel to Manager Node <MP_UUID>(<MP_IP_ADDRESS>) is down for 15 minutes.

       

  • Some maintenance work has been done on the ESXi host but the work was completed 
  • Verified that nsxproxy service is running with no issues.
    Login to ESXI host and run the following command to: 
    • Confirm that nsxproxy service is running:  
      /etc/init.d/nsxproxy status
    • Confirm that TCP connection is established with NSX Manager over TCP port 1235:
      localcli network ip connection list | grep 1235

Environment

VMware NSX 4.1.x

Cause

Deleting client may fail to handle some messages due to an empty parameter.

Resolution

This issue is resolved in VMware NSX 4.2, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Workaround: Resolve the alarm manually.

  • Access NSX GUI
  • Go to Alarms and select the stale alarm
  • Click on actions and click on Resolve

Additional Information

NSX Transport Node Network latency Alarm is not being cleared in NSX 4.1