This article introduces common scenario of HA switch over and how to fix it. This document contains information that is common to all edge hardware models and supported software versions
VeloCloud SD-WAN
In this article, we will look at common scenarios causing HA failover and logs seen on VCO and Edge CLI during such events.
HA failover event is categorized as an event that forces current active edge to become the standby and the standby edge takes over the role as the new active.
HA failover can occur due to multiple reasons. The following Events/Configuration change causes an edge service restart and consequently an HA failover
Irrespective of the cause for HA failover, the following logs are what we need to look at during troubleshooting.
(1) Verify the current Active and Standby edge serial #'s on the drop down
To trigger Force HA Failover. On the VCO, Under Diagnostics → Remote Actions → Force HA Failover
(2) Check VCO event logs during HA failover
Check for events prior to the HA flip to determine the reason for failover. Here, VCO event lists forceHAFailover which was the force HA failover.
“Standby state ready for failover” log indicates standby edge has reached its final HA state “Standby ready”
Click on any of the events with the word “High Availability” to get a pop-up with the serial # of the active and standby edge
(3) To verify if the Active and standby serial #'s have flipped, check the details on the drop down and compare with snapshot taken before HA failover.
Scenario #1
Heartbeat failure between the HA edges leading to dual active scenario.
This is seen due to heartbeat failure and standby transitioning to active. In this case you will see HA flap events on the VCO
Scenario #2
Failover caused by edge device settings configuration change
VCO events indicating edge device settings change, edge restart and subsequent HA failover
Scenario #3
Active edge crashes causing a HA failover.
VCO events indicate standby edge going active due to peer heartbeat missed for default hold timer of 400ms. VCO events also logs edge crash event with “Service edged failed” error
Scenario #4
WAN/LAN interface of the active HA edge going down
In an ideal case, both active and standby edges need to have the same number of WAN and LAN interfaces up. When either the WAN or LAN interface of the active edge goes down, a HA failover occurs.
Consider a situation where WAN interface GE5 goes down on the active edge. This leads to standby HA edge taking over the role of active
Scenario #5
Force Failover
When performing force failover in VCO, VCO event indicates "Edge remote action" and "High Availability Going Active".