VM on NSX VLAN segment can’t reach default gateway
search cancel

VM on NSX VLAN segment can’t reach default gateway

book

Article ID: 407094

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • VM on NSX VLAN segment unable to reach its default gateway.
  • ARP broadcast request observed on the VM’s switchport but not on the host’s physical uplink.

To retrieve the switchport and uplink used by VM use nsxdp-cli command:

nsxdp-cli vswitch instance list | grep -i <VM -name>

Switchport capture:

pktcap-uw --switchport <switchport> --capture VnicTx,VnicRx -o - | tcpdump-uw -enr - 

        Uplink capture:

        pktcap-uw --uplink vmnicx --capture UplinkSndkernel,UplinkRcvKernel -o - | tcpdump-uw -enr -

  • Packet trace on the host shows the ARP broadcast packet is getting drop by function 'Vmxnet3VMKDevTQDoTx.

packet trace:

pktcap-uw --trace --srcmac <src-vm-mac> --ethtype 0x0806

Vsock connection from port 1031 cid 2.
05:09:22.763657[1] Captured at PktFree point, Drop Reason 'Port Blocked'. Drop Function 'Vmxnet3VMKDevTQDoTx'. TSO not enabled, Checksum not offloaded and not verified, length 60.
        PATH:
          +- [05:09:22.763642] |                           VnicTx |  <portid> |
          +- [05:09:22.763645] |                        PortInput |  <portid> |
          +- [05:09:22.763656] |                             Drop |            |
          +- [05:09:22.763656] |                          PktFree |            |
        Segment[0] ---- 60 bytes:

  •  net-dvs -l output shows that the port is not in a blocked state.
       com.vmware.common.port.block = false ,  propType = POLICY

Environment

VMware NSX

Cause

The VM is unable to reach its default gateway because the ARP broadcast traffic is being dropped at the ESXi host level by the function “Vmxnet3VMKDevTQDoTx”, preventing the packet from leaving the virtual switch.

Resolution

Workaround steps: 

  • Put the host into Maintenance Mode.

  • Move the host out of the cluster in vCenter.

  • NSX UI , the host will be in Other Nodes, select the host and force delete the NSX configuration.

  • Once the host shows Not Configured, add it back to the cluster so it can receive the configuration from the Transport Node Profile (TNP).

  • Verify host status is healthy in NSX UI and VM connectivity is restored.

If the workaround didn't help, please open a support case with Broadcom Support along with support bundle NSX manager, ESXI host and packet capture on following points (switchport, uplink and trace).

  • pktcap-uw --switchport <switchport> --capture VnicTx,VnicRx -o - | tcpdump-uw -enr - 
  • pktcap-uw --uplink vmnicx --capture UplinkSndkernel,UplinkRcvKernel -o - | tcpdump-uw -enr -
  • pktcap-uw --trace --srcmac <src-vm-mac> --ethtype 0x0806