- What is the failover mechanism and the triggers in an active/backup only scenario?
- The backup link becomes active when there is no active link present. This means, if all the overlay paths to the Primary gateway on the active link(s) went QUIET, the backup link would initiate tunnels to the VeloCloud endpoints added to the interface.
QUIET state:
- When the link is passing real-time traffic, the overlay heartbeat interval is 100 milliseconds.
- With all other traffic, the interval is 500 milliseconds.
- The Overlay path will be marked as QUIET if there is 700 milliseconds of silence on the path (i.e no packet received from the peer) regardless of the interval.
2. Is there any way to delay the failover time for the backup link?
- With Release 3.2.2 and later, there is the ability to modify the argument in the /etc/config/edged `backup_up_only_on_dead` to Yes so that the backup link becomes active only when the active link(s) are in a DEAD state, verus a QUIET state. Changing this would add ~1 second of delay to the failover time since the edged process would now wait until the link was DEAD (versus QUIET) before failing over to the backup link.
3. Does the backup link carry user traffic when the primary link is not down?
- A backup link is expected to not forward any user traffic. The only traffic a backup link uses consist of heartbeat probes to the primary and super gateways at 30 second intervals.
- There is a known issue, #29709--Where a backup link in a spoke may continue to forward traffic even when it returns to STANDBY mode once the active link comes back. This issue will be resolved in an upcoming build.
4. What is the monthly bandwidth overhead of SD-WAN Management/Control traffic use for a link in Backup mode?
- The backup link is expected to use ~20-30 MBs per month while in backup mode. This data usage is from the periodic heartbeat probes.
- When the backup link becomes active, the amount of data usage would depend on the user traffic and the number of overlay tunnels we have established.
5. If the Backup link is unable to push Edge stats (flow/link), what is the buffer on the edge or how long is the data stored on the edge?
- In general, if the Edge is unable communicate to the VCO, the Edge would keep up to 5 links and flow stats blobs in memory for retries.
- We have an enhancement requested open to store the stats on disk in this situation, and for a much longer period (at least for link stats), and then retry them in chunks when the heartbeats succeed again.
6. What is a Hot standby link?
- In the current design, tunnel creation for the backup link is initiated only after the active links have failed. The time required to create the tunnel and bring the link up causes degradation in data transfer and may even cause VoIP calls to drop.
- To overcome this issue and optimize link usage, there would be an additional mode for the links: hot standby.
- Hot standby links are ready to use links that will help avoid delays when the active links fail and help with the sub-second switchover. Tunnels are created for the hot standby link and a probe is sent to ensure the readiness of the tunnel.
- The probe interval is slow (5000ms) compared to active links and other data and control traffic is not permitted to use the hot standby tunnel. There will be no user traffic passing through the hot standby link when it is not active. This will help in optimal bandwidth utilization.
- When the active link(s) fail, the transition to the hot standby link is faster as the tunnels have already been established.
7. How often are heartbeats sent to the Gateway? How many have to be missed for the backup link to be declared dead?
- Backup links will send a probe to the gateway every 30 seconds
- If there's no incoming response from the gateway for 3 consecutive probes (i.e. 90 seconds) the link will be cleared dead. (You'll see a link down event on the VCO.) On the next incoming response you'll see a Link up event.
8. Backup link is not configurable in edges acting as Hub or part of clusters. Please refer https://docs.vmware.com/en/VMware-SD-WAN/5.0/VMware-SD-WAN-Administration-Guide/GUID-634787D6-F223-4765-B157-ED96B81973BD.html#GUID-634787D6-F223-4765-B157-ED96B81973BD:~:text=Use%20this%20option,of%20a%20Cluster.