This issue occurs due to an error in alarm event handling.
The fix ensures an Edge reboot will no longer generate the second log line referenced above and the alarm will clear post reboot.
Symptoms:
The following conditions are seen
- When an Edge VM is rebooted or has a maintenance mode change, the following events may be logged to /var/log/nsx-event.log
<29>1 2019-12-12T10:57:58.483338+02:00 Edge NSX 17 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-nsxa.ha_cluster" level="INFO" eventId="vmwNSXClusterFailoverStatus"] {"event_state":0,"event_external_reason":"Service router switches over from Down to Active","event_src_comp_id":"########-####-####-####-##########","event_sources":{"id":"########-####-####-####-##########"}}
<29>1 2019-12-12T10:58:46.506940+02:00 Edge NSX 17 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-nsxa.ha_cluster" level="INFO" eventId="vmwNSXClusterFailoverStatus"] {"event_state":1,"event_external_reason":"Service router switches over from Down to Standby","event_src_comp_id":"########-####-####-####-##########","event_sources":{"id":"########-####-####-####-##########"}}
- This triggers an alarm which is seen when querying via API or an alert from VRNI
NSX-T System Event - 1 events
Edge node ########-####-####-####-########## fails to failover, Reason: Service router switches over from Down to Standby.
Severity: Critical
- The alarm does not resolve