New deployed Transport Node (Host) added to NSX cluster TEP never comes up
search cancel

New deployed Transport Node (Host) added to NSX cluster TEP never comes up

book

Article ID: 422063

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • This may be observed when:
    • New Transport Node TEP tunnels do not come up
    • Virtual Machines running on overlay Network are dropping Network connection
  • The NSX IP pool has multiple IP subnets and the previous subnet becomes exhausted
    • This causes the next added transport node to receive an IP from that subnet
    • This subnet may not configured on the upstream TOR interfaces and/or potentially no gateway configured in the physical infrastructure

Environment

VMware NSX

VMware vSphere ESX

Cause

TOR switch doesn't have the VLAN configured on the interface

  • When the new Transport node was added, the TOR switches did not have the VLAN added to the trunk group.
  • The following might be observed:
    • Cannot ping the TEP Gateway
    • Packet capture doesn't see the ARP broadcast on the corresponding teamed NIC - This indicates there is a L2 issue.
      • [root@YourHost :~ ] pktcap-uw -- uplink vmnic2 -- capture UplinkSndKernel, UplinkRcvKernel -- ethtype 0x0806 -- mac 00:50:56:##:##:## -o - | tcpdump-uw -enr

        The name of the uplink is vmnic2.
        {...}
        pktcap: Vsock connection from port 1326 cid 2.
        19:26:14.109180 00:50:56:##:##:## > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has ###.###.98.1 tell ###.###.99.2, length 46
        19:26:15.119153 00:50:56:##:##:## > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has ###.###.98.1 tell ###.###.99.2, length 46
        19:26:16.129131 00:50:56:##:##:## > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has ###.###.98.1 tell ###.###.99.2, length 46
        19:26:17.139113 00:50:56:##:##:## > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has ###.###.98.1 tell ###.###.99.2, length 46
        tcpdump-uw: pcap_loop: error reading dump file: Interrupted system call
        pktcap: Receive thread exiting ...
        pktcap: error: Error writing 28 bytes of pkt header to file.
        pktcap: Destroying session 300.
        pktcap:
        pktcap: Dumped 5 packet dropped 0 packets.
        pktcap: Done.

      • [root@YourHost :~ ] pktcap-uw -- uplink vmnic0 -- capture UplinkSndKernel, UplinkRcvKernel -- ethtype 0x0806 -- mac 00:50:56:##:##:## -o - | tcpdump-uw -enr

        The name of the uplink is vmnic0.
        {...}. NOTE: There are no ARP broadcasts across the other teamed vmnic of the host                  <------------------
        pktcap: Vsock connection from port 1326 cid 2.
        tcpdump-uw: pcap_loop: error reading dump file: Interrupted system call
        pktcap: Receive thread exiting ...
        pktcap: error: Error writing 28 bytes of pkt header to file.
        pktcap: Destroying session 300.
        pktcap:
        pktcap: Dumped 0 packet dropped 0 packets.
        pktcap: Done.

Resolution

Validate:

  • Validate that the TN can ping:
    • TEP to TEP
    • TEP to gateway
  • The TOR switch has all required VLANs on the trunk group for proper network functionality
  • Physical infrastructure has the Gateway configured for routing

Additional Information

For further information review:
Testing VMkernel network connectivity with the vmkping command