Troubleshooting HCX Network Extension Datapath Failures

Products

VMware HCX

Issue/Introduction

HCX Network Extension service allows creation of a Layer 2 network at destination HCX site (utilizing NSX) and bridge this remote network to the source network. There are a variety of potential reasons why this Datapath could fail. This article will serve as a point of reference to review some possible causes for Datapath failure as well as what documentation is required when opening a support request with Broadcom.

Environment

VMware HCX

Resolution

Below are some common troubleshooting steps to perform:

Source/on-prem VM on network A cannot communicate with cloud/target VM on stretched network A.
- Verify the IP config (subnet, gateway etc.) is correct inside guest-OS of each affected VM on cloud.
On both HCX-MGRs UI verify the tunnel state . If tunnel state is showing green proceed to step3.
1. If tunnel state shows "red" , check if NE uplink. interface can reach its respective HCX-MGR (NE-I with HCX-Connector etc..)
  1. Check connectivity between NE Uplink interfaces
    1. type "ccli"
    2. Type "list" to show HCX appliances.
    3. Type "go #" - replace # with the number of the appliance.
    4. Type "ssh" to change context to CLI of appliance.
    5. Ping the peer HCX-NE appliance uplink IP : "ping #.#.#.#"
      1. If this is not successful - check that the NE networks are "connected" in VC UI.
    6. Verify the ICMP/UDP 4500 packets are leaving ESXi. To do this follow the steps below.
      1. Find the MAC of NE uplink interface using VC UI. (see above screenshot)
      2. SSH to the esxi host the NE resides on.
      3. run the command "netdbg vswitch instance list" - find the MAC corresponding to uplink interface. Record what Uplink vmnic it is using.
      4. Perform a packet capture from ESXI
        
        pktcap-uw --capture UplinkSndKernel,UplinkRcvKernel --uplink vmnic# -o - | tcpdump-uw -r - -ean host uplink-IP
        
        Pay attention to 2 types of traffic.
        
        The ICMP traffic which was initiated via ping from NE directly.
        
        The UDP 4500 packets NE sends to its peer to establish the tunnel.
        
        If no ICMP reply or UDP traffic from peer NE is seen then this is an indication of a physical networking issue. Remember this trace shows packets are leaving the physical ESXi NIC and therefore handed off to the physical network.
        
        Type "ctrl-c" to stop the capture.

If the above steps show connectivity is healthy however there are still connection issues being reported by HCX UI please file a request with Broadcom support and provide the below information.

This KB ID
HCX version
Brief problem description
- Include Source VM name, IP/subnet and network segment/portgroup
- Include Destination VM name, IP/subnet and network segment
- OS type/version for affected VM's used for testing.
- VMware tools version.
Is MON enabled?
What NE # is servicing this extension?

Additional Information

Helpful documents that should be reviewed to ensure HCX NE configuration is valid:

MON (Mobility Optimized Networking) is a feature of HCX Network Extension service that allows a cloud/destination VM on segment A to communicate with another cloud VM on segment B without having to route back to on-prem/source gateway.

HCX NE appliances can be configured for High Availability (HA). This further protects extended networks from a NE appliance failure at either site.

Known issues with HCX Network Extension service: