Duplicate VNI encountered with Logical Switches in VMware NSX
search cancel

Duplicate VNI encountered with Logical Switches in VMware NSX

book

Article ID: 373063

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Logical switches created in NSX may be assigned a VNI (VXLAN network Identifier) of an existing segment, resulting in duplicate VNI(s) in the system.
  • Realization of a new Logical Switch on an Edge Transport Node may fail.
  • A VNI ID is associated with a Logical Switch, but the same VNI ID is still available in the VNI pool.
  • Logical Switches with the same VNI ID can be observed in output of "get logical-switches" run on NSX Manager:
    • In admin mode:
      get logical-switches


    • In root mode:

      # su admin -c get logical-switches | grep -v "^$\|VNI" | awk '{print $1, $3}' | sort | uniq -w 6 -c | sort -r | head -n 5

      Note: line count in "head -n 5" can be increased to display more lines.
  • Intermittent network connectivity issues between VM's maybe experienced.
  • The IPV4 ARP resolution or IPV6 neighbour discovery process may fail. This is due to broadcast or multicast packets not being transmitted to any of the vTEP interface IP addresses configured on the host that the target VM is running on. This can be confirmed using the following steps:
    • Output the source hosts TEP table using the following command, and inspect it for the existence of an IP address entry for one of the destination hosts TEP interfaces. If this does not exist then the broadcast or multicast packets will not be forwarded to that host:

- esxcfg-vswitch -l (Use this command to get the vSwitch Name)

- net-vdl2 -M vtep -s <vSwitch Name> -n <VNI ID of the segment the destination VM is connected to>

    • Packet capture on the pNIC of the source host for transmitted traffic. Inspect the destination IP address field in the outer GENEVE headers for one of the destination hosts vTEP IP addresses:

- esxcfg-vmknic -l (Use this command on the destination host to get the vTEP IP addresses)

- pktcap-uw --uplink <vmnicX> --capture UplinkSndKernel --vni <VNI ID of the segment the destination VM is connected to>  -o - | tcpdump-uw -enr - | grep <dest vtep IP address>

Environment

VMware NSX 4.x.
VMware NSX-T Data Center 3.x.

Cause

An update or delete of a Logical Switch (LS) done on the Management Plane is handled concurrently by two different Manager nodes.
As a part of the delete activity, the NSX Manager releases the VNI, but an update operation which is triggered a few milliseconds later prevents the deletion of that Logical Switch / VNI from the NSX Manager's Corfu DB.

Due to this, the NSX Manager may have stale VNI entries for VNIs which are already associated with a Logical Switch, but at the same time, these VNIs are also available for consumption (to be assigned) in the VNI pool, which creates a possibility where a free VNI may be picked from the pool and assigned to a new Logical Switch.

The fix for this issue is to ensure that only one NSX Management node handles CRUD operations for a Segment


Resolution

This issue is resolved in VMware NSX 4.1.1, 4.2.0 and VMware NSX-T Data Center 3.2.4, available at Broadcom downloads.

An upgrade will not resolve already impacted VNIs, the upgrade will carry over duplicate VNIs, any existing duplicates will need to be cleaned up manually by Broadcom Support. 

 

 

Additional Information

If you are contacting Broadcom support about this issue, in order to aid a timely response and resolution, please provide the following:

  • NSX version with build number
  • NSX Manager log bundles.
  • ESXi host log bundles for hosts that are failing to configure as transport nodes.
  • Text of any error messages seen in NSX GUI or command lines pertinent to the investigation and screenshot.

Handling Log Bundles for offline review with Broadcom support


Related Installation documentation: