Cloud VM's using HCX Extended Network (without MON) encountering abrupt loss of connectivity
search cancel

Cloud VM's using HCX Extended Network (without MON) encountering abrupt loss of connectivity

book

Article ID: 344987

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

To assist TSE's in identifying a known issue within NSX-T enabled DVSs

Symptoms:
Cloud VM's using HCX Extended Network (without MON) encountering abrupt loss of connectivity to on-prem gateway/VM's when on-prem NE and/or cloud VM is vMotioned.

Cause

This behavior has been found to be caused by the on-prem NSX enabled DVS utilizing multiple uplinks (non-LAG) and VM ports configured with both ‘mac-learning’ and ‘sink port’ configurations. When a user extends a network using HCX , HCX first looks at what type of port-group its extending. If it is a ESX DV Port-Group then the NE adapter servicing the L2E is created with ‘sink port’ configuration.

If the network being extended is a NSX vlan segment the NE adapter is created with ‘mac-learning’ config.

Due to there being a mix of ‘sink port’ and ‘mac-learning’ enabled ports in the DVS, and there are multiple DVS uplinks (non-LAG), the mac address of the cloud workload vm is being learned from the uplink and the reply from on-prem gateway/VM is not forwarded to NE vm as expected.

Resolution

VMware is aware of this issue and is working on a fix for future release.

Workaround:
There are a few ways to mitigate this issue. 
  1. Remove one of the uplinks from the DVS.
  2. Instead of extending the non NSX Port Group, create and extend a NSX vlan segment (make sure vlan is the same as original PG).


Additional Information

In order to check what the sink port status is on a VM you can use the below commands

nsxdp-cli vswitch instance list

[root@esxi-70u3-1:~] nsxdp-cli vswitch instance list
DvsPortset-2 (HCX-NVDS)          46 6a 50 7c 88 3c 48 c7-b5 ca 2e 7d 13 53 94 7a
Total Ports:3840 Available:3813
  Client                         PortID          DVPortID                             MAC                  Uplink
  Management                     134217740                                            00:00:00:00:00:00    n/a
  vmnic6                         2281701392      7bd730a1-0948-42e1-85f5-32afb3c4e338 00:00:00:00:00:00
  Shadow of vmnic6               134217745                                            00:50:56:55:a5:dc    n/a
  vmnic7                         2281701394      64488cd9-6087-4d6d-a0de-bcbaaeecfd13 00:00:00:00:00:00
  Shadow of vmnic7               134217747                                            00:50:56:51:3c:fa    n/a
  vmk10                          134217762       c422dbef-fbf5-41ad-ac1b-e5b5e386cab2 00:50:56:61:78:f4    vmnic6
  vmk50                          134217764       5d71d732-762f-47de-aad6-9e4d81e0e517 00:50:56:61:24:58    void
  vdr-vdrPort                    134217765       vdrPort                              02:50:56:56:44:52    vmnic6


Below will show mac-learning status
[root@esxi-70u3-1:~] nsxdp-cli vswitch mac-learning port get -p 7bd730a1-0948-42e1-85f5-32afb3c4e338 -dvs HCX-NVDS
MAC Learning:                   False
Unknown Unicast Flooding:       False
MAC Limit:                      4096
MAC Limit Policy:               ALLOW

And now sink port status 
[root@esxi-70u3-1:~] nsxdp-cli vswitch sink get -p 7bd730a1-0948-42e1-85f5-32afb3c4e338 --dvs HCX-NVDS
disabled

NOTE: Be sure to use the portid associated with the VM_eth interface using the on-prem portgroup. 



You may use the below commands to perform a packet capture on ESXi physical uplink and then on VM network adapter to confirm whether the ARP packet is forwarding to the VM correctly. 
  • pktcap-uw --uplink vmnic1 --capture UplinkSndKernel,UplinkRcvKernel --ip x.x.x.x -o - | tcpdump-uw - -r -ean
Note:  Please verify which vmnic the vm adapter is using with the nsxdp-cli command.

The following will perform a trace on VM adapter level. Confirm the switchport ID 
  • pktcap-uw --switchport 50331659 --capture VnicTx,VnicRx --ip x.x.x.x -o - | tcpdump-uw - -r -ean


Impact/Risks:
Reply from on-prem gateway/VM is not forwarded to on-prem NE VM as expected. This can be observed via packet trace. You will see the ARP response from gateway hit the physical ESXi vmnic, but if you trace on VM interface/port ID you will see the ARP response was not forwarded from physical nic to VM network adapter.