VeloCloud Gateway (VCG) Purpose & Resiliency
search cancel

VeloCloud Gateway (VCG) Purpose & Resiliency

book

Article ID: 330700

calendar_today

Updated On:

Products

VMware SD-WAN by VeloCloud

Issue/Introduction

This article explains about the VeloCloud Gateway functionality and Resiliency.


Environment

VMware SD-WAN by VeloCloud

Resolution

For Edge’s running Release 3.2.x or earlier, there are three VeloCloud Gateway's assigned to each Edge in an Enterprise—Primary, Secondary, and Super Gateway. An Edge using Release 3.3.0 or later will have four VCG's assigned to it as a result of the addition of a second, redundant Super Gateway. There may be additional VCG’s assigned depending on whether the Edge is also connected to a Non-VeloCloud site or using a Cloud Security Service (e.g. Zscaler).


Primary VCG : All Cloud (internet) traffic will use this VCG. If the Primary VCG goes down, Cloud traffic would be sent Direct to the Cloud from the Edge WAN link. Once the Primary VCG is back up, existing flows would continue to go direct to ensure these existing flows remain intact. Any new flow would travel via the Primary VCG. If there are two Edges in the Enterprise that are connected to the same Primary Gateway, then that VCG would be used for site-to-site VPN traffic as well.

Secondary VCG:This VCG is only used for VPN traffic when the Primary VCG goes down and only if it is a common gateway between Edges. Once the Primary VCG is back up, existing flows will continue to go via the Secondary Gateway to ensure these existing flows remain intact. Any new VPN flow would use the Primary VCG.

Super VCG: Depending on the Edge release (mentioned above) an Edge has either one or two Super Gateways. All the Edges which do not have a common Primary or Secondary Gateway will use the Super Gateway for Edge-to-Edge communication.

The VeloCloud Orchestrator (now the VMware Edge Cloud Orchestrator) assigns the Gateways to the Edge when it comes up based on the location of the Edge. By default, the location of the Edge is determined by the WAN IP address when the Edge comes up. The Orchestrator queries the MaxMind database to get the location and then calculates and assigns the closest Gateway for the Edge. You may change the assignment of VCG by manually editing the address for an Edge from Configure → Edge Overview → Contact & Location. If you change the location of the Edge to something that results in different Gateways being assigned, the VeloCloud service will tear down all existing tunnels using the current VCG's and establish tunnels with the new VCG's assigned.

Example: There are 5 Edges in the Enterprise, 2 located in New York, 1 in Mumbai, and 2 in Frankfurt.

New York Edges: The Primary VeloCloud Gateway will be the closest Gateway to the Edge and is used for sending Cloud/Internet traffic. Most likely this Gateway would be hosted in, or close to the New York region. Depending on the precise location of both New York Edges, they would likely have the same Primary and Secondary Gateway. As a result, these two Edges would have redundancy between each other, because if the Primary VCG goes down they can still use the Secondary Gateway for VPN traffic. They would use the Super Gateway for VPN traffic to the Mumbai and Frankfurt Edges.

Mumbai Edge: Primary VCG would be the closest to this location with the Secondary VCG being the second nearest to the Mumbai location.

Frankfurt Edges: Primary VCG will be the Gateway closest to their location. Both Edges would likely have the same Primary and Secondary Gateways given they are in the same city.

Super Gateway(s):
The Orchestrator calculates the Super Gateway(s) assignments based on the location of all Edges in the enterprise.

Routing:

For resiliency purposes, VeloCloud SD-WAN Edges connect to multiple VeloCloud Gateways - each of which functions as a route reflector. To be truly resilient, a single Gateway going down must not affect user traffic when alternate VeloCloud Gateways are available.

If an Edge using Release 3.2.0 (and earlier) loses contact with a Gateway, there may be a route flush and re-advertisement with a resulting significant impact to ongoing user traffic where large route tables and/or dynamic routing are configured.

From Release 3.2.1 onwards, an Edge losing connectivity to a VeloCloud Gateway marks routes as stale and triggers a timer to remove those routes after 60 seconds if the Gateway is still not connected. When the 60 seconds timer expires:

● If the Gateway is still down, routes are removed for this Gateway.
● If all Gateways are down, the routes are retained, and the timer is restarted.
● If the Gateway reconnects with the Edge, the Edge removes any stale routes no longer present on
the Gateway.

This offers full protection against any momentary loss of connectivity to any single Gateway, as well as ensuring that traffic between Edges is not impacted even if all Gateways are down for a prolonged period.

Example Topology:
Let's consider a Hub and Spoke scenario. Both the Hub and Spoke Edge have a Public and a Private link. Branch-to-Hub is configured, so both sites have two direct tunnels between them. There are no paths/tunnels from the Private link to any of the VeloCloud Gateways. Only the Public link has paths/tunnels to all the VeloCloud Gateways.


Case#1:
If the Public link on the Spoke goes down then the site will lose all the connectivity to all VCGs. However, we still have a Stable path between HUB and Spoke on the Private link. To offer full protection, VPN traffic between these sites will not be impacted because routes will not be revoked and will continue to use the Private path. If Internet Backhaul is configured on the Spoke via Business Policy, the internet traffic can use this Private path to go the HUB and then to the Internet. Any new routes added/deleted will not be advertised/removed by/from Spoke as there is no connectivity to any of the VCG's. If you want to have reachability to the VCG's from a Private link, you can enable SD-WAN Service Reachable under Device tab --> WAN Overlay.

Case#2:
If the path from the Public link on the Spoke goes down for one VCG only, then routes from that VCG will be revoked. In other words, the Gateway that no longer has a path to the Spoke will remove the routes to this Spoke from all other sites that are connected to it once the 60 seconds timer has expired. Once the Gateway is reconnected, the Edge will remove any stale routes no longer present on the Gateway."