SD-WAN Edge in HA goes offline after failover or software upgrade
search cancel

SD-WAN Edge in HA goes offline after failover or software upgrade

book

Article ID: 345687

calendar_today

Updated On:

Products

VMware VMware SD-WAN by VeloCloud

Issue/Introduction

Symptoms:
For a customer enterprise site deployed with a High Availability topology where the HA Edges are using a 5.2.0.x software version, either upgrading the HA Edge pair to any other Edge software version, or a standard fail-over of the Active Edge to the Standby Edge may result in the site going offline on the VMware SASE Orchestrator, though customer traffic would continue to pass

Environment

VMware SD-WAN
VMware SD-WAN by VeloCloud

Cause

This is caused by known issue 137279

The issue is the result of the Standby Edge's certificate not being renewed and ultimately expiring. The Active Edge's certificate is properly updated so the issue only manifests upon an HA fail-over, when the promoted Standby Edge with the expired certificate causes the HA pair to go offline on the Orchestrator. This means the site cannot be reached, monitored, or managed through the Orchestrator.

When a customer has Certificate Enabled for their enterprise, the Active HA Edge includes the Standby Edge's certificate digest as part of every heartbeat it sends to the Orchestrator. The Orchestrator uses this certificate digest to renew the Standby Edge's certificate. The cause of the issue is a defect in this certificate digest generation process which results in a certificate digest that consists of an empty string and this results in the Standby Edge's certificate never being renewed unless a manual renewal is done from the Orchestrator prior to an HA fail-over or before the Standby Edge certificate expires.

Resolution

This issue is resolved in VMware SD-WAN Edge version 5.2.2.0 or later 

For information on how to upgrade please check the following article: VMware SD-WAN Software Upgrade FAQs

Workaround:
The only way to restore the site is to get physical access to the HA Edge pair and reboot the current Active Edge (the former Standby Edge) to trigger an HA fail-over. This can be done either through the Local UI (if so enabled) or by power cycling the Active Edge.

Once back online with the Orchestrator, a manual renewal can be done to renew the certificates both HA Edges

Additional Information

Impact/Risks:
A manual certificate renewal results in tunnel flaps since the Active Edge's certificate is also renewed