Troubleshooting Active/Active panic issue in High Availability (HA) setup
search cancel

Troubleshooting Active/Active panic issue in High Availability (HA) setup

book

Article ID: 323711

calendar_today

Updated On:

Products

VMware

Issue/Introduction

The VCO UI events indicate that the standby edge connected in High Availability (HA) mode frequently transitions to a "peer unknown" state multiple times

 

Environment

All SD-WAN Edges in HA setup running equal or lower than 4.5.2 software version in 4.5.x release cycle.

Cause

Active edge failed to respond to heartbeats to the standby edge within 700ms due to active edge being busy with other processes. At this point in time, standby edge think that active edge is down and tries to take the active role. As soon as standby edge becomes active, Active/ Active panic takes place since active edge is already up and running. To avoid panic situation, standby edge reboot itself and get back to the standby role.

HA panic could impact the business in case of Enhanced HA setup, for Standard HA there should be minimal to no impact.

VMware SASE 4.5.2 Release Notes

Resolution

There are several defect with related to HA panic issue has been identified in the older versions, most of them are fixed in the latest build of 4.5.2.

It is suggested to upgrade the HA edges to the latest build of 4.5.2 to avoid HA panic issue.

Without the fix, the workaround is to increase the HA failover time from 700ms to 7000ms.


 

Additional Information

Some known issue related to HA fixed in software release 4.5.2

Issue #85369
Issue #103662
Issue #112115
Issue #112131
Issue #118333
Issue #122988
Issue #123128
Issue #126458

Note: Apart from those issues, there is one bug 119446, that is present in all 4.5.x releases, which can cause an active/active panic as well. Workaround is to set HA failover timer to 7000ms and fix is to upgrade any 5.x or higher release.