Cloud VM's using HCX Extended Network (without MON) encountering abrupt loss of connectivity
search cancel

Cloud VM's using HCX Extended Network (without MON) encountering abrupt loss of connectivity

book

Article ID: 344987

calendar_today

Updated On:

Products

VMware HCX VMware NSX

Issue/Introduction

Cloud VM's using HCX Extended Network (without MON) encountering abrupt loss of connectivity to on-prem gateway/VM's when on-prem NE and/or cloud VM is vMotioned.

Cause

This behavior has been found to be caused by the on-prem NSX enabled DVS utilizing multiple uplinks (non-LAG) and VM ports configured with both ‘mac-learning’ and ‘sink port’ configurations. When a user extends a network using HCX, HCX first looks at what type of port-group its extending. If it is a ESX DV Port-Group then the NE adapter servicing the L2E is created with ‘sink port’ configuration.

If the network being extended is a NSX vlan segment the NE adapter is created with ‘mac-learning’ config.

Due to there being a mix of ‘sink port’ and ‘mac-learning’ enabled ports in the DVS, and there are multiple DVS uplinks (non-LAG), the mac address of the cloud workload vm is being learned from the uplink and the reply from on-prem gateway/VM is not forwarded to NE vm as expected.

Resolution

This issue is resolved in VMware NSX 3.2.4
This issue is resolved in VMware NSX 4.2.0

Workaround:
There are two ways to mitigate this issue. 

  1. Remove one of the uplinks from the DVS.

    OR

  2. Instead of extending the non NSX Port Group, create and extend a NSX vlan segment (make sure vlan is the same as original PG).

Additional Information

In order to check what the sink port status is on a VM you can use the below commands

nsxdp-cli vswitch instance list

[] nsxdp-cli vswitch instance list
DvsPortset-2 (HCX-NVDS)          46 6a 50 7c 88 3c 48 c7-b5 ca ## ## ## ## ## ##
Total Ports:3840 Available:3813
  Client                         PortID          DVPortID                             MAC                  Uplink
  Management                     13421####                                            00:00:00:00:00:00    n/a
  vmnic6                         228170####      7bd730a1-####-####-####-32afb3c4e338 00:00:00:00:00:00
  Shadow of vmnic6               13421####                                            00:50:56:##:##:##    n/a
  vmnic7                         228170####      64488cd9-####-####-####-bcbaaeecfd13 00:00:00:00:00:00
  Shadow of vmnic7               13421####                                            00:50:56:##:##:##    n/a
  vmk10                          13421####       c422dbef-####-####-####-e5b5e386cab2 00:50:56:##:##:##    vmnic6
  vmk50                          13421####       5d71d732-####-####-####-9e4d81e0e517 00:50:56:##:##:##    void
  vdr-vdrPort                    13421####       vdrPort                              02:50:56:##:##:##    vmnic6

Below will show mac-learning status

[] nsxdp-cli vswitch mac-learning port get -p 7bd730a1-####-####-####-32afb3c4e338 -dvs HCX-NVDS
MAC Learning:                   False
Unknown Unicast Flooding:       False
MAC Limit:                      4096
MAC Limit Policy:               ALLOW

And now sink port status 

[] nsxdp-cli vswitch sink get -p 7bd730a1-####-####-####-32afb3c4e338 --dvs HCX-NVDS
disabled

NOTE: Be sure to use the portid associated with the VM_eth interface using the on-prem portgroup. 



You may use the below commands to perform a packet capture on ESXi physical uplink and then on VM network adapter to confirm whether the ARP packet is forwarding to the VM correctly. 

pktcap-uw --uplink vmnic1 --capture UplinkSndKernel,UplinkRcvKernel --ip #.#.#.# -o - | tcpdump-uw - -r -ean

Note:  Please verify which vmnic the vm adapter is using with the nsxdp-cli command.

The following will perform a trace on VM adapter level. Confirm the switchport ID 

pktcap-uw --switchport 5033#### --capture VnicTx,VnicRx --ip #.#.#.# -o - | tcpdump-uw - -r -ean



Impact/Risks:

Reply from on-prem gateway/VM is not forwarded to on-prem NE VM as expected. This can be observed via packet trace. You will see the ARP response from gateway hit the physical ESXi vmnic, but if you trace on VM interface/port ID you will see the ARP response was not forwarded from physical nic to VM network adapter.