Pings fail between two VMs on different hosts across a logical switch
search cancel

Pings fail between two VMs on different hosts across a logical switch

book

Article ID: 319491

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:

  • VMs on the same ESXi host can ping each other reliably
  • VMs on separate ESXi hosts are unable to ping each other
  • All VMs can ping their default gateways
  • In Multicast and Hybrid mode NSX deployments, some Hypervisor VTEPs fail to join multicast groups
  • VMs fail to respond to ARP requests and appear non-responsive

Environment

VMware NSX for vSphere 6.1.x
VMware NSX for vSphere 6.2.x
VMware NSX for vSphere 6.0.x
VMware NSX for vSphere 6.3.x
VMware NSX for vSphere 6.4.x

Cause

This issue occurs due to mismatched Teaming Policies within an NSX environment, where the vDS setup is modified through vCenter Server, after the preparation of an NSX cluster, which is not supported.

Resolution

Use the following procedure to validate whether the existing teaming policy setup is consistent or not:

 
  1. Run the command below to verify the deployed teaming policy used by distributed routing from the netcpa log:

    cat var/log/netcpa.log | grep 'teamingPolicy'

    The command outputs the DLR log applicable only to the DLR port:

    2016-04-04T17:13:54.243Z [4E758B70 info 'Default'] Updated vdr dvs entry 23 e8 3a 50 68 d3 25 d4-81 xx xx xx xx xx xx xx:(updated by VSM):vxlanOnly:teamingPolicy = LOADBALANCE_SRCID:numUplink = 2:numActiveUplink = 2:uplinkPortNames = Uplink 2,Uplink 1

    Note: The above output is only available if distributed routing is enabled or configured. In the example above, the NSX cluster teaming policy is to load balance on virtual source ID with 2 active uplinks, Uplink1 and Uplink2.
     
  2. Run the command below to identify the VXLAN Tunnel End Point (VTEP) vmknic Port ID and Endpoint ID for all VXLAN Tunnel End point (VTEP) interfaces:

    net-vdl2 -l -s vDS_NSX_Name

    Where: vDS_NSX_Name is the name of the virtual distributed switch that was prepared for VXLAN.

    The command outputs the following:

    vmknic count: 2
    VXLAN vmknic: vmk3
    VDS port ID: 45
    Switch port ID: 33554441
    Endpoint ID: 0
    VLAN ID: 602
    IP: x.x.x.x
    Netmask: 255.255.252.0

    ……
    VXLAN vmknic: vmk4
    VDS port ID: 52
    Switch port ID: 33554442
    Endpoint ID: 1
    VLAN ID: 602
    IP: x.x.x.x
    Netmask: 255.255.252.0
    Segment ID: x.x.x.x


    Where:
    • Vmknic count: 2 are VTEP interfaces vmk3 and vmk4 in this particular example
    • VDS port ID: 45 and VDS port ID: 52 are the reference ports used in step 3 below.
    • Endpoint ID: 0 and Endpoint ID: 1 are the reference vmknic IDs used in step 4 below. This ID increments sequentially from 0.
       
  3. Run the command below to identify the VTEP interfaces teaming policy and active links for all VTEP ports:

    net-dvs -l

    The command output is as follows:

    port 45:
    com.vmware.common.port.alias = , propType = CONFIG
    com.vmware.common.port.connectid = 530521281 , propType = CONFIG
    com.vmware.common.port.portgroupid = dvportgroup-208 , propType = CONFIG
    com.vmware.common.port.block = false , propType = CONFIG
    com.vmware.common.port.dvfilter = filters (num = 0):
    propType = CONFIG
    com.vmware.common.port.ptAllowed = 0x 0. 0. 0. 0
    propType = CONFIG
    com.vmware.etherswitch.port.security = deny promiscuous; deny mac change; deny forged frames
    propType = CONFIG
    com.vmware.etherswitch.port.txUplink = normal , propType = CONFIG
    com.vmware.net.vxlan.vmknic = 0x 1

    propType = CONFIG POLICY
    com.vmware.etherswitch.port.teaming:
    load balancing = source virtual port id
    link selection = link state up;
    link behavior = notify switch; best effort on failure; shotgun on failure;
    active = Uplink 2; Uplink 1;
    standby =
    propType = CONFIG

    port 52:
    com.vmware.common.port.alias = , propType = CONFIG
    com.vmware.common.port.connectid = 537062281 , propType = CONFIG
    com.vmware.common.port.portgroupid = dvportgroup-208 , propType = CONFIG
    com.vmware.common.port.block = false , propType = CONFIG
    com.vmware.common.port.dvfilter = filters (num = 0):
    propType = CONFIG
    com.vmware.common.port.ptAllowed = 0x 0. 0. 0. 0
    propType = CONFIG
    com.vmware.etherswitch.port.security = deny promiscuous; deny mac change; deny forged frames
    propType = CONFIG
    com.vmware.etherswitch.port.txUplink = normal , propType = CONFIG
    com.vmware.net.vxlan.vmknic = 0x 1
    propType = CONFIG POLICY
    com.vmware.etherswitch.port.teaming:
    load balancing = source virtual port id
    link selection = link state up;
    link behavior = notify switch; best effort on failure; shotgun on failure;
    active = Uplink 2; Uplink 1;
    propType = CONFIG


    Where:
    • port 45: and port 52: are the reference ports output from the step 2 below.
    • com.vmware.net.vxlan.vmknic = 0x 1 is the result of the sanity check that mentioned ports are connected to a VTEP interface.
    • load balancing = source virtual port id is the reference teaming policy as configured in the system.
    • active = Uplink 2; Uplink 1 are the active links from the step 1.
       
  4. Run the command below to verify the VM VXLAN traffic exits on a known VTEP Interface:

    net-vdl2 -l -s vDS_NSX_Name -n 6007

    Where : 6007 is the VNI ID on which VM resides.


    The command outputs the following:

    VXLAN Global States:
    Control plane Out-Of-Sync: No
    UDP port: 8472
    VXLAN network: 6007
    Multicast IP: 0.0.0.0
    Control plane: Disabled
    MAC entry count: 0
    ARP entry count: 0
    Port count: 1
    VXLAN port: vdrPort
    Switch port ID: 33554443
    vmknic ID: 1


    Where:
    • vmknic ID: 1 explains the traffic to and from the VM's on this virtual wire has passing through a known Endpoint IT as per step 2. Valid Entries here are 0 and 1. Any other value means VM's are unable to communicate between ESXi hosts.
       
  5. Run the command below to verify the VM VXLAN traffic is being encapsulated error free:

    net-vdl2 -S -s vDS_NSX_Name

    The command outputs the following:

    tx.passThrough: 0
    tx.vxlanTotal: 0
    tx.clone: 0
    tx.tso: 0
    tx.csum: 0
    tx.drop.invalidFrame: 0
    tx.drop.guestTag: 0
    tx.drop.noResource: 0
    tx.drop.invalidState: 0
    rx.passThrough: 0
    rx.vxlanTotal: 0
    rx.clone: 0
    rx.drop.invalidFrame: 0
    rx.drop.notExist: 0
    rx.drop.noResource: 0
    forward.pass: 0
    forward.reject: 0
    forward.rpf: 0
    arpProxy.reply.total: 0
    arpProxy.reply.fail: 0
    arpProxy.request.total: 0
    arpProxy.request.fail: 0
    mcastProxy.tx.total: 0
    mcastProxy.tx.fail: 0
    mcastProxy.rx.total: 0
    mcastProxy.rx.fail: 0


    Where:
    • tx.drop.invalidState: 0 shows that VM is on a VXLAN with correct Endpoint ID, It will be non-zero if the VM is on VXLAN with an incorrect Endpoint ID.
net-vdl2 -S -s vDS_NSX_Name -n 6007

tx.total: 0
tx.nonUnicast: 0
tx.crossRouter: 0
tx.drop.total: 0
rx.total: 0
rx.mcastEncap: 0
rx.crossRouter: 0
rx.drop.wrongDest: 0
rx.drop.invalidEncap: 0
rx.drop.total: 0
mac.lookup.found: 0
mac.lookup.flood: 0
mac.lookup.full: 0
mac.update.learn: 0
mac.update.extend: 0
mac.update.full: 0
mac.age: 0
mac.renew: 0
arp.lookup.found: 0
arp.lookup.unknown: 0
arp.lookup.full: 0
arp.lookup.wait: 0
arp.lookup.timeout: 0
arp.update.update: 0
arp.update.unkown: 0
arp.update.notFound: 0
arp.age: 0
arp.renew: 0


Where:
    • tx.drop.total: 0 shows that VM is on a VXLAN with correct Endpoint ID, It will be non-zero if the VM is on VXLAN with an incorrect Endpoint ID.

Inconsistencies cannot be corrected by modifying the NSX cluster teaming policy through the GUI.

The supported method to correct teaming policy inconsistencies is to create and prepare a new NSX cluster with the required teaming policy, and to migrate ESXi hosts to that cluster. Changing the teaming policy in the manager DB by some other means only applies to newly created virtual wires after the DB change is made.

For more information on the supported configurations, see the Teaming Policy for Virtual Distributed Switches section in the NSX installation and Upgrade Guide.