NSX Transport Node Network latency Alarm is not being cleared in NSX 4.1

Products

VMware NSX

Issue/Introduction

The following alarm is seen in the NSX UI and returns consistently after resolving on its own.

The average network latency between manager nodes and host is more than 150ms for 5 minutes.
Recommended Action
1. Wait for 5 minutes to see if the alarm automatically gets resolved. 2. Ping the NSX Transport node from Manager node. The ping test should not see drops and have consistent latency values. VMware recommends latency values of 150ms or less. 3. Inspect for any other physical network layer issues. If the problem persists, contact VMware Support.

The alarm is repeatedly reported by one Manager node.

Rebooting all 3 of the NSX Managers may resolve the alarm temporarily.

After the alarm is triggered, pings between the NSX Manager and Transport Node show latency below 150ms.

NSX Managers are showing healthy in NSX and vSphere.

Transport Nodes are showing healthy in vSphere.

The Transport Nodes are communicating with no issues.

The following messages are seen in the NSX Manager messaging logs, /var/log/messaging-manager/messaging-manager.log

2024-11-16T15:05:05.186Z  WARN messaging-mgr-executor- EventSource - MONITORING [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="messaging"] Ignore the updateDeletedEntities call for cluster_event=true alarm: featureName: communicat
ion, eventTypeId: 5, since current node is not the leader.
2024-11-16T15:05:05.186Z  WARN messaging-mgr-executor- AlarmManager - - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="messaging"] Exception while handling alarms based on client update.
java.lang.NullPointerException: null
        at com.vmware.nsx.messaging.manager.service.alarm.AlarmManager.updateTask(AlarmManager.java:465) ~[libmessaging-manager.jar:?]
        at com.vmware.nsx.messaging.manager.service.alarm.AlarmManager.lambda$update$0(AlarmManager.java:459) ~[libmessaging-manager.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_382]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_382]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_382]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_382]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_382]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_382]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
2024-11-16T15:05:05.189Z  WARN pool-3-thread-1 ClientManager - - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="messaging"] No clientRecord found for clientId <UUID>
2024-11-16T15:05:05.189Z  INFO pool-3-thread-1 ClientManager - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="messaging"] Client <UUID>, is already deleted.
2024-11-16T15:05:05.189Z  INFO pool-3-thread-1 NodeAphRealizer - SYSTEM [nsx@6876 comp="nsx-manager" level="INFO" subcomp="messaging"] Deleted client '<UUID>' from APH.

Environment

NSX 4.1.x

Cause

The NSX manager is not correctly clearing the alarm state. This results in the alarm being triggered repeatedly.

Resolution

This is fixed in NSX 4.2.x.

Please upgrade to 4.2 or later (recommended).