Common scenarios causing VeloCloud SD-WAN Edge HA failover
search cancel

Common scenarios causing VeloCloud SD-WAN Edge HA failover

book

Article ID: 323739

calendar_today

Updated On:

Products

VMware VeloCloud SD-WAN

Issue/Introduction

This article introduces common scenario of HA switch over and how to fix it. This document contains information that is common to all edge hardware models and supported software versions

Environment

VeloCloud SD-WAN

Resolution

In this article, we will look at common scenarios causing HA failover and logs seen on VCO and Edge CLI during such events.

HA failover event is categorized as an event that forces current active edge to become the standby and the standby edge takes over the role as the new active.

  HA failover can occur due to multiple reasons. The following Events/Configuration change causes an edge service restart and consequently an HA failover

 

 

Irrespective of the cause for HA failover, the following logs are what we need to look at during troubleshooting.

(1) Verify the current Active and Standby edge serial #'s on the drop down



  

To trigger Force HA Failover. On the VCO, Under Diagnostics → Remote Actions → Force HA Failover


(2) Check VCO event logs during HA failover

 


 

  Check for events prior to the HA flip to determine the reason for failover. Here, VCO event lists forceHAFailover which was the force HA failover.

“Standby state ready for failover” log indicates standby edge has reached its final HA state  “Standby ready”

Click on any of the events with the word “High Availability” to get a pop-up with the serial # of the active and standby edge



  

(3) To verify if the Active and standby serial #'s have flipped, check the details on the drop down and compare with snapshot taken before HA failover.

 


Scenario #1

  Heartbeat failure between the HA edges leading to dual active scenario.

 This is seen due to heartbeat failure and standby transitioning to active. In this case you will see HA flap events on the VCO

 

 

Scenario #2

Failover caused by edge device settings configuration change

  VCO events indicating edge device settings change, edge restart and subsequent HA failover

 

 

Scenario #3

  Active edge crashes causing a HA failover.

VCO events indicate standby edge going active due to peer heartbeat missed for default hold timer of 400ms. VCO events also logs edge crash event with “Service edged failed” error

 

  

Scenario #4

WAN/LAN interface of the active HA edge going down

  In an ideal case, both active and standby edges need to have the same number of WAN and LAN interfaces up. When either the WAN or LAN interface of the active edge goes down, a HA failover occurs.

Consider a situation where WAN interface GE5 goes down on the active edge. This  leads to standby HA edge taking over the role of active

 

 
Following are the event logs seen  during a HA failover  triggered when a LAN interface goes down on the active HA edge


 

Scenario #5

Force Failover

When performing force failover in VCO,  VCO event indicates "Edge remote action" and "High Availability Going Active".