Following the vMotion of an NSX Edge node, a network flap may occur, causing the active Tier-0 and Tier-1 gateways to fail over.
While reviewing the system state, you observe that no alarms were raised by the NSX Manager for this failover event.
2025-12-16T11:26:22.276Z edge_hostname NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="svcrt-fsm" level="INFO" org="default" proj="####"] ########-####-####-####-########843f event NodeDown [Active,Unreachable] reason 'Tunnels Down'2025-12-16T11:26:22.276Z edge_hostname NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="svcrt-fsm" level="INFO" org="default" proj="####"] ########-####-####-####-########004d event NodeDown [Active,Unreachable] reason 'Tunnels Down'2025-12-16T11:26:22.276Z edge_hostname NSX 1 - [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-agent" s2comp="nsx-monitoring" entId="########-####-####-####-########aedc" tid="1" level="ERROR" eventState="On" eventFeatureName="high_availability" eventSev="error" eventType="tier1_gateway_failover"] Context report: {"previous_gateway_state":"Active","current_gateway_state":"Down","entity_id":"########-####-####-####-########aedc","service_router_id":"########-####-####-####-########843f","failover_reason":"Tunnels Down"}2025-12-16T11:26:22.277Z edge_hostname NSX 1 - [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-agent" s2comp="nsx-monitoring" entId="########-####-####-####-########63ac" tid="1" level="ERROR" eventState="On" eventFeatureName="high_availability" eventSev="error" eventType="tier1_gateway_failover"] Context report: {"previous_gateway_state":"Active","current_gateway_state":"Down","entity_id":"########-####-####-####-########63ac","service_router_id":"########-####-####-####-########004d","failover_reason":"Tunnels Down"}2025-12-16T11:26:22.734Z edge_hostname NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="svcrt-fsm" level="INFO" org="default" proj="####"] ########-####-####-####-########843f event NodeUp [Down,Unknown] reason 'Tunnels Up'2025-12-16T11:26:22.734Z edge_hostname NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="svcrt-fsm" level="INFO" org="default" proj="####"] ########-####-####-####-########004d event NodeUp [Sync,Unknown] reason 'Tunnels Up'VMware NSX
The root cause is the timing discrepancy between the duration of the transient failover event and the alarm collection sampling interval.
The NSX alarm framework uses a polling mechanism to check the state of specific events. For gateway failover events (tier0_gateway_failover and tier1_gateway_failover), the standard sampling_interval is 60 seconds.
In this scenario:
The network flap caused by vMotion is transient. The gateway enters a failed state (eventState="On") but recovers almost immediately (eventState="Off").2025-12-16T11:26:22.276Z edge_hostname NSX 1 - [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-agent" s2comp="nsx-monitoring" entId="########-####-####-####-########aedc" tid="1" level="ERROR" eventState="On" eventFeatureName="high_availability" eventSev="error" eventType="tier1_gateway_failover"] Context report: {"previous_gateway_state":"Active","current_gateway_state":"Down","entity_id":"########-####-####-####-########aedc","service_router_id":"########-####-####-####-########843f","failover_reason":"Tunnels Down"}2025-12-16T11:26:22.277Z edge_hostname NSX 1 - [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-agent" s2comp="nsx-monitoring" entId="########-####-####-####-########63ac" tid="1" level="ERROR" eventState="On" eventFeatureName="high_availability" eventSev="error" eventType="tier1_gateway_failover"] Context report: {"previous_gateway_state":"Active","current_gateway_state":"Down","entity_id":"########-####-####-####-########63ac","service_router_id":"########-####-####-####-########004d","failover_reason":"Tunnels Down"}[....][....]346109:2025-12-16T11:26:23.472Z edge_hostname NSX 1 - [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-agent" s2comp="nsx-monitoring" entId="########-####-####-####-########aedc" tid="1" level="ERROR" eventState="Off" eventFeatureName="high_availability" eventSev="error" eventType="tier1_gateway_failover"] Context report: {"previous_gateway_state":"Down","current_gateway_state":"Active","entity_id":"########-####-####-####-########aedc","service_router_id":"########-####-####-####-########843f","failover_reason":"Remote state changed to Active"}349790:2025-12-16T11:26:26.837Z edge_hostname NSX 1 - [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-agent" s2comp="nsx-monitoring" entId="########-####-####-####-########63ac" tid="1" level="ERROR" eventState="Off" eventFeatureName="high_availability" eventSev="error" eventType="tier1_gateway_failover"] Context report: {"previous_gateway_state":"Down","current_gateway_state":"Active","entity_id":"########-####-####-####-########63ac","service_router_id":"########-####-####-####-########004d","failover_reason":"Remote state changed to Active"}
For example in the above, the event was "On" at 11:26:22 and revert to "Off" by 11:26:26 (a duration of only 4 seconds).
If the NSX Manager alarm collector executes its check outside of this specific 4-second window, it reads the status as Off.
Because the alarm framework is currently not designed to latch onto or aggregate high-frequency "flapping" events that resolve faster than the polling cycle, the alarm is not triggered.
This behavior is a known limitation of the current alarm framework regarding transient states caused by rapid network flapping.
The Engineering team is aware of this limitation and may implement aggressive alarm collection in future releases.