Unable to resolve EAM Status Down alarm in NSX-T
search cancel

Unable to resolve EAM Status Down alarm in NSX-T

book

Article ID: 312616

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • ESXi Agent Manager / EAM Status Down alarms triggered with no apparent EAM issues.
  • After resolving the alarm, it returns soon after. 
  • An alarm is raised and user resolved with entries similar to the below observed on an NSX Manager node in var/log/phonehome-coordinator/phonehome-coordinator.log
2024-03-05T02:14:26.712Z FATAL http-nio-127.0.0.1-7449-exec-3 MonitoringServiceImpl 18560 MONITORING [nsx@6876 alarmId="c#######-####-####-####-#########6c6" alarmState="RESOLVED" comp="nsx-manager" entId="9#######-####-####-####-#########768" errorCode="MP701099" eventFeatureName="endpoint_protection" eventSev="CRITICAL" eventState="Off" eventType="eam_status_down" level="FATAL" nodeId="9#######-####-####-####-#########6c6" subcomp="monitoring"] User resolved.
  • For the same reporting NSX Manager node, a sync request for the feature side to return the latest status with the following observed in var/log/phonehome-coordinator/phonehome-coordinator.log
2024-03-05T02:14:26.695Z INFO http-nio-127.0.0.1-7449-exec-3 MonitoringFacadeImpl 18560 MONITORING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="monitoring"] bulkSetAndVerifyAlarmsStatus: setting requires sync for user resolved alarm c#######-####-####-####-#########1e7
  • On a different NSX Manager node a sync is triggered but fails with the following observed: in var/log/phonehome-coordinator/phonehome-coordinator.log
2024-03-05T02:22:09.448Z INFO pool-45-thread-1 MonitoringSyncService 4471 MONITORING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="monitoring"] Built Sync Request entityId: 9#######-####-####-####-#########768, eventTypeId: 1, featureId: 13, sourceId: proton_eam_service
.
.
.
2024-03-05T02:22:19.448Z WARN pool-118-thread-1 MonitoringSyncProcessor 4471 MONITORING [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="monitoring"] initiateSyncRequest: unexpected error invoking sync on feature 13 eventType 1 node 9#######-####-####-####-#########6c6 endpoint 9#######-####-####-####-#########6c6 source proton_eam_service entity 9#######-####-####-####-#########768: java.util.concurrent.TimeoutException
  • In subsequent full syncs the alarm is still present as it the original reporting NSX Manager node still has no record of the alarm being resolved with the following observed: in var/log/phonehome-coordinator/phonehome-coordinator.log.

2024-03-05T03:53:01.274Z INFO pool-46-thread-13380 FullSyncRequester 4471 MONITORING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="monitoring"] FullSyncRequester: node 9#######-####-####-####-#########6c6, endpoint 9#######-####-####-####-#########768, result true
.
.
.
2024-03-05T03:53:09.211Z FATAL pool-118-thread-1 MonitoringServiceImpl 4471 MONITORING [nsx@6876 alarmId="c#######-####-####-####-#########6c6" alarmState="OPEN" comp="nsx-manager" entId="9#######-####-####-####-#########6c6" errorCode="MP701099" eventFeatureName="endpoint_protection" eventSev="CRITICAL" eventState="On" eventType="eam_status_down" level="FATAL" nodeId="9#######-####-####-####-#########768" subcomp="monitoring"] ESX Agent Manager (EAM) service on compute manager 9#######-####-####-####-#########6c6 is down.

 

Environment

VMware NSX 4.x
VMware NSX-T Data Center 3.x

Cause

EAM experiences impact, an alarm is raised while one of the NSX Manager nodes is the clusterEventLeader. The clusterEventLeader changes and following the change, the EAM issue is resolved but the alarm will not clear due to a bug in the alarm framework encountered when the leaders change.

Resolution

This issue is resolved in VMware NSX 4.2.0, available at Broadcom downloads.

If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.



Workaround:

Restart the proton service on the NSX Manager node reporting the alarm using the below command:
#service proton restart 

Additional Information