SD-WAN Edge in HA goes offline after failover or software upgrade
book
Article ID: 345687
calendar_today
Updated On:
Products
VMwareVMware SD-WAN by VeloCloud
Issue/Introduction
Symptoms: For a customer enterprise site deployed with a High Availability topology where the HA Edges are using a 5.2.0.x software version, either upgrading the HA Edge pair to any other Edge software version, or a standard fail-over of the Active Edge to the Standby Edge may result in the site going offline on the VMware SASE Orchestrator, though customer traffic would continue to pass
Environment
VMware SD-WAN VMware SD-WAN by VeloCloud
Cause
This is caused by known issue 137279
The issue is the result of the Standby Edge's certificate not being renewed and ultimately expiring. The Active Edge's certificate is properly updated so the issue only manifests upon an HA fail-over, when the promoted Standby Edge with the expired certificate causes the HA pair to go offline on the Orchestrator. This means the site cannot be reached, monitored, or managed through the Orchestrator.
When a customer has Certificate Enabled for their enterprise, the Active HA Edge includes the Standby Edge's certificate digest as part of every heartbeat it sends to the Orchestrator. The Orchestrator uses this certificate digest to renew the Standby Edge's certificate. The cause of the issue is a defect in this certificate digest generation process which results in a certificate digest that consists of an empty string and this results in the Standby Edge's certificate never being renewed unless a manual renewal is done from the Orchestrator prior to an HA fail-over or before the Standby Edge certificate expires.
Resolution
This issue is resolved in VMware SD-WAN Edge version 5.2.2.0 or later
Workaround: The only way to restore the site is to get physical access to the HA Edge pair and reboot the current Active Edge (the former Standby Edge) to trigger an HA fail-over. This can be done either through the Local UI (if so enabled) or by power cycling the Active Edge.
Once back online with the Orchestrator, a manual renewal can be done to renew the certificates both HA Edges
Additional Information
Impact/Risks: A manual certificate renewal results in tunnel flaps since the Active Edge's certificate is also renewed