Intermittent reachability issues observed between on-prem (source) and cloud VM (target) when MON is enabled for HCX cloud-to-cloud deployments
search cancel

Intermittent reachability issues observed between on-prem (source) and cloud VM (target) when MON is enabled for HCX cloud-to-cloud deployments

book

Article ID: 401566

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • When MON enabled and intermittent packet loss is observed from  on-prem VM (source) to cloud VM (target)

  • On the ESXi that source VM is registered you do not see the target IP address for the Neighbor entry. Firstly determine  VDR UUID for the T1 Gateway by running the below command on the ESXi
     
    nsxcli -c get logical-routers
                                                  Logical Routers Summary
    ------------------------------------------------------------------------------------------------------------------
                   VDR UUID                LIF num   IPv4 Route num   IPv6 Route num  Max Neighbors  Current Neighbors
     9aab434f-####-####-####-############     2            13               22            50000              10
     221bf8a39-####-####-####-###########     5             6                9            50000              16

  • Using the  VDR UUID for the T1 Gateway command nothing is returned for the target IP address

     nsxcli -c get logical-router 221bf8a39-####-####-####-########### neighbor 192.168.100.10

Environment

VMware HCX 

Cause

This issue occurs due to a timing mismatch in ARP (Address Resolution Protocol) table entries across different ESXi hosts VDR configuration:

  1. ARP Table Synchronization Issue: NSX maintains separate ARP tables for each logical router on each ESXi host. These tables can expire at different times.
  2. Failed ARP Resolution Process:
    • When the ARP entry expires in the on-premises VM's host, it sends a broadcast ARP request.
    • The HCX Network Extension appliance's host intercepts this request.
    • If NE ESXi host ARP table still has a valid entry, it converts the broadcast to a unicast ARP request.
    • With MON enabled, this unicast packet gets rewritten with special HCX MAC.
  3. Packet Drop: The modified unicast ARP request is dropped by NSX's built-in security policy at the destination, preventing ARP resolution.

 

Resolution

  • This is a known issue impacting VMware HCX
  • This condition only exists in the following scenarios:
    • Short-lived connections which allow the VDR neighbor table to age out. 
    • When source & on-prem utilize NSX and MON is enabled. 

Workarounds:

  • Disable MON.
  • Perform a constant PING between source/destination VM or establish some sort of connection that is constant and not short lived (less than 600 seconds). 

  • Create VM affinity rules to stick VM's to ESXi host which has NE-I appliance.