This document goes through identifying and troubleshooting bi-directional connectivity issues when VeloCloud Edge is being used at both the Branch site and Data Center or Branch to Hub. The document assumes the following topology where a VeloCloud Hub is installed at each of the two Data Centers acting as primary and secondary DC. The data centers are connected via a Data Center Interconnect (DCI) running a separate OSPF process. Only the Data Center Hubs are running OSPF with the peered firewall while Branch Edges are configured as ‘out of path’ and require route redirection (i.e. static route) to send traffic from Branch to DC through the Branch Edge. The subnets of interest are 10.32.0.0/12 in Corporate and 10.231.239.0/24 at the MPLS/Internet Branch. The Corporate subnet 10.32.0.0/12 is a part of the 10.0.0.0/8 summary route also advertised to the Branch sites.
No traffic is passing through the Branch Edge
No reachability from Branch to Corporate or vice versa
Asymmetric routing issue causing UDP traffic to go through but TCP traffic to fail
Verify Routing at Branch Site or Hub site in case of Hub-spoke topology
Verify route redirection on the Branch Router
Verify local static routes on the VCE
Verify routing table on the VCE
Traceroute from Branch Router
2. Verify Routing in Data Center
Validate routing table on the Hub
Verify Cloud VPN Configuration in Branch Profile
Traceroute from Data Center to Branch Site
Summary
When any of the symptoms are observed, the first step in troubleshooting is to verify traffic flow from the Branch to Data Center/Corporate/Hub and in the reverse direction. The following steps goes through the validation of routing at the Branch.
At the Branch offices ‘out of path’ with its Overlay consisting of , the VeloCloud Edge (VCE) is configured both the Internet and MPLS interfaces. Per the network topology all user traffic to Corporate should follow the summary route 10.0.0.0/8 and gets redirected to the VCE. The route to the subnet of the Hub’s internal interface should go through MPLS for the Overlay from Branch to Hub on the MPLS side to be properly set up.
On the Branch router, verify that traffic to the Corporate network 10.32.0.0/12 has the next hop pointing at the VeloCloud Edge which has interface IP address of 10.22.1.137. In this case the static route entry for 10.0.0.0/8 redirects traffic destined for Corporate towards the VCE.
Branch-ISR#show ip route vrf SF1_MPLS_INT 10.32.0.0
Routing Table: SF1_MPLS_INT
Routing entry for 10.0.0.0/8
Known via "static", distance 1, metric 0
Routing Descriptor Blocks:
* 10.22.1.137
Route metric is 0, traffic share count is 1
Since the VCE is inserted “out of path” without any dynamic routing protocol configured, the local subnets on the LAN side of the Branch Router must be configured as static routes on the VCE for return traffic from Corporate/DC to be properly routed. Under “Configure > Edges > Device” tab on the VCO, verify that the subnets local to the site are configured with the interface connected to the Branch Router as ‘Next Hop’ and the ‘Advertise’ flag checked.
On the VCO, under “Test & Troubleshoot > Remote Diagnostics”, Route Table Dump will show all the routes the VCE learned from the VeloCloud Controllers (VCC). Note there are duplicate entries with different costs due to multiple Hubs being deployed. If subnet of the Corporate host is part of a known route in the Route Table (ie 10.32.0.0/12), the Branch VCE will simply forward traffic toward the originating VCE – which is the Hub in this case if the Overlay has been successfully set up.
VPN Test under the same “Remote Diagnostics” page can be used to quickly verify the Overlay status and end-to-end connectivity between the Edges.
Traceroute from Branch Router to Validate Utilized DC Hub
If the aforementioned routing configuration is properly in place. Perform a traceroute from the Branch router sourcing from the local LAN interface to detect potential source of the problem. The traceroute results will give us a good idea of which Hub the traffic traverses from Branch to Corporate. Traceroute in the reverse direction should also be performed to verify that traffic from Corporate to Branch also goes through the same Hub. In the following example, 10.37.17.236 (part of 10.32.0.0/12) is the IP address of the Corporate host, and 172.31.2.1 is the IP address of inside interface on DC1 Hub that has OSPF peering with the Internal firewall. We can clearly see the traffic traversing DC1 in this case.
Branch-ISR#ping vrf SF1_MPLS_INT 10.37.17.236 source gig4
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.37.17.236, timeout is 2 seconds:
Packet sent with a source address of 10.231.239.33
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 32/36/40 ms
Branch-ISR#traceroute vrf SF1_MPLS_INT 10.37.17.236 source gig4
Type escape sequence to abort.
Tracing the route to 10.37.17.236
VRF info: (vrf in name/id, vrf out name/id)
1 24.6.180.138 0 msec 0 msec 1 msec
2 172.31.2.1 45 msec 38 msec 37 msec
3 172.31.2.2 40 msec 36 msec 38 msec
4 172.31.1.2 37 msec 40 msec 39 msec
5 172.31.0.1 39 msec 38 msec 39 msec
6 172.31.99.1 39 msec * 37 msec
If the traffic is traversing the incorrect Hub, go to “Configure > Overlay Flow Control” and check the order of the Hubs under “Preferred VPN Exits” column. The proper Hub (ie DC1 in our example) should be selected as the first choice. If the placement is incorrect, modify the order and ensure "Pin Learned Route Preferences” is checked. An incorrect order typically means the cost of the same route advertised via OSPF to the Hubs did not prefer the correct DC Hub in question. In the following example, if the 10.0.0.0/8 route was advertised by the firewalls in each data center to the corresponding Hubs with the same cost, then it’s possible the order is incorrect due to DC1 Hub not being preferred by the route advertisement and a race condition occurs. Note the Overlay Flow Control table only shows the routing direction from the SDWAN infrastructure to the destination subnet 10.0.0.0/8 and not the reverse path.
Similar to the steps described in the last section, the next step in troubleshooting is to verify traffic flow from the Data Center to the Branch. The following steps go through the validation of routing in the Data Center. Note that in this example since the DC Hub learns all its routes from the peer through OSPF, there is no need to configure any static routes on the Hub and no traffic redirection occurs.
On the VCO, under “Test & Troubleshoot > Remote Diagnostics”, Route Table Dump will show all the routes the VCE learned from the VeloCloud Controllers (VCC). The Branch subnet in question is 10.231.239.0/24 in our example. The “Cloud VPN” route with cost of 0 will be used since it’s reachable directly to the Edge through Overlay. Notice there are entries where Reachable is set to “False”. These are routes that exist on DC2 Hub that are not reachable due to Profile setups that do not configure an Overlay between the Hubs. This is the expected behavior to simplify routing since there is a Data Center Interconnect between the two DC’s.
Go to “Configure > Profile” and verify that the Branch Profile has the correct Hub placement. If the following Hub order is in place and the Hubs are configured with OSPF, we should see both hubs advertising the same OSPF routes – while the LSA from DC1 Hub should carry a lower cost than DC2.
By default, the Hubs advertise its learned Branch routes as OSPF E1 routes. If order of the Hubs is configured as shown in the above screenshot, DC1 Hub will advertise the Branch subnet 10.231.239.0/24 with E1 metric of 10 while DC2 Hub will advertise the same subnet to its neighbor with E1 metric of 11. The following output are from the MPLS CE routers in each data center. Notice the DC1 router receives the route with metric of 11 while DC2 router has the route installed in its routing table with metric of 12. This configuration will intentionally route traffic from Corporate to the Branch sites through DC1. Careful routing considerations in the Data Centers should be followed through to prefer DC1 in this case.
DC1 CE routing table:
O E1 10.231.239.0/24 [110/11] via 172.31.3.1, 06:27:03, GigabitEthernet6
O E1 10.231.240.0/24 [110/11] via 172.31.3.1, 04:27:14, GigabitEthernet6
DC2 CE routing table:
O E1 10.231.239.0/24 [110/12] via 172.29.0.3, 06:30:39, GigabitEthernet7
O E1 10.231.240.0/24 [110/12] via 172.29.0.3, 06:30:39, GigabitEthernet7
If the proper configurations are in place and bi-direction connectivity issues still exist, perform a traceroute from the Corporate host to the Branch subnet to estimate and determine where the source of the problem exist. If UDP traffic is working while TCP connectivity fails to establish properly, it’s usually an indication of asymmetric routing issue where traffic in on direction is not passing through the VeloCloud Overlay. In the following example the traceroute results show the unidirectional path going through DC1 and then reaching the destination through the Overlay.
Corporate-Router#traceroute vrf CORP 10.231.239.33
Type escape sequence to abort.
Tracing the route to 10.231.239.33
VRF info: (vrf in name/id, vrf out name/id)
1 172.31.99.2 1 msec 1 msec 1 msec
2 172.31.3.1 1 msec 2 msec 1 msec
3 12.16.196.83 2 msec 2 msec 2 msec
4 * * *
5 10.22.1.138 36 msec 35 msec *
From the VCO you can check the flow created for your concerned traffic regardless it is Edge to Edge or Edge to Hub or Edge to Data Center or Edge to Gateway or Edge to underlay direct (not using overlay)
below Example from Edge flow page at VCO verifying the flow route type and next hop, with confirmation that traffic is 1 Way which means that we need to validate either the client or server behind the Edges why there is no response :
Using packet capture On LAN side from VCO at Branch and the same on Hub site will provide you more clarity on how actual traffic is passing, this can be done through VCO after going to diagnostic bundle then request PCAP Bundle
then choose Branch site name and LAN interface and Hub site name and LAN interface to Get PCAP from both side at the same time, below is example for taking PCAP on Hub-spoke Topology where branch is b1-edge1 and Hub is b2-edge1
This document summarized the steps taken to troubleshoot connectivity issues between a VeloCloud site and a Non-VeloCloud Corporate subnet advertised to the Hubs through OSPF. Routing is unidirectional and several methods to validate the routing path were outlined.
Notice how the configuration of “Branch to VeloCloud Hubs” and its order within the Branch Profile dictate the costs and preference of the SDWAN routes advertised by the Hubs. This is done to prefer the traffic from Corporate subnets to the SDWAN Branch subnets through the Hub with the higher placement.
On the other hand, Overlay Flow Control for subnets learned via OSPF and the order of “Preferred VPN Exit” influence traffic from the SDWAN infrastructure to the non-SDWAN subnets. The order is arranged by the VeloCloud Gateway based on the costs of the learned routes and how they were advertised to the Hubs. To ensure proper connectivity, the order of the Hub placement within the Profile configuration should match the preferred VPN exit point for the non-SDWAN subnets.
If after any of the above troubleshooting steps you couldnt figure out what is the route cause for any connectivity issue, you are welcome to open Support case collecting above information gathered from the troubleshooting steps in conjunction with Diagnostic bundle from the in question Edges with Packet capture from Interface which carrying interested traffic