NSX TEP/MAC Table Poisoning
search cancel

NSX TEP/MAC Table Poisoning

book

Article ID: 404152

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

When pinging between two VMs that are on different ESXi hosts but in the same segment, some intermittent disconnects occur. 

Environment

NSX 4.1.2.3

Cause

This issue can occur when the NSX Managers have long uptimes and can cause the control plane to have some syncing issues.

 

 

Resolution

To determine if the TEP/MAC table is experiencing issues with syncing with accurate data the below process can be attempted:

  1. Open a putty session to both ESXi hosts (source VM ESXi host and destination VM ESXi host).
  2. On each putty session run the command esxtop and then type the letter "n" to gain access to the networking view. 
  3. Collect the following data:
    1. Source VMs vmnic#
    2. Destination VMs vmnic#
      • Below is an example of what the network view of esxtop looks like and where to gather the data needed.
      • The blue box is the VMs vmnic# (or vmnic that is in use by the VM).
    3. Once both source and destination VM data is recorded press the letter "q" to quit out of the esxtop networking view and back to root access.
    4. On the source VMs ESXi host prepare the following command: pktcap-uw --uplink vmnic# --capture UplinkSndKernel -o - |tcpdump-uw -enr- | grep <desination IP> 
      • Be sure to replace the vmnic# with the correct vmnic in use by the source VM and the <destination IP> is replaced with the destination VMs IP address (no need to include the <> in the command).
      • Press ENTER to begin running the command
    5. On the destination VMs ESXi host prepare the following command: pktcap-uw --uplink vmnic# --capture UplinkRcvKernel -o - |tcpdump-uw -enr- | grep <source IP>
      • Be sure to replace the vmnic# with the correct vmnic in use by the source VM and the <source IP> is replaced with the source VMs IP address (no need to include the <> in the command).
      • Press ENTER to begin running the command.
    6. Once both packet capture commands are running on the ESXi host putty sessions, begin pinging from the source VM to the destination VM. Let the ping run for a few minutes, and ensure that there are disconnects occurring. For example successful pings may appear like the green boxes below and unsuccessful pings may appear like the red boxes below (please note that these are just examples, the specific message for the disconnected pings may differ):
    7. If there are unsuccessful packets showing on the ping tests while the packet captures are running proceed with stopping both captures pressing CTRL C and then proceed to step 9.
    8. If there are only successful packets showing on the ping tests, proceed with stopping (CTRL C) both putty sessions and restarting the packet capture sessions (pressing the up arrow on the keyboard will bring up the command again). Give the new packet captures a few minutes to run and ensure they have captured while pings were failing. Repeat step 8 as needed if all pings are successful. 
      ***Please note that packet captures are used for troubleshooting purposes only and should not be left running for long periods of time as this can take resources***
    9. Begin the review of the packet capture and focus on the following items:
      • VM IP addresses
      • VM MAC addresses  
      • TEP IP addresses 
      • TEP MAC addresses 
      • Sequence numbers 
        • It is important to note that when looking at the source VM packet capture, there should be ICMP echo requests going out and on the destination VM packet capture there should be ICMP echo requests coming in as the packet capture commands being ran are only outgoing from the source VM and only incoming to the destination VM.
      • The data within the packet capture can be identified like so:
    10. If the packet captures do not show any change in TEP IP or MAC address occur while the unsuccessful pings are present then it can be confirm that there is not a TEP/MAC table issue occurring and that there is another cause that is disrupting connectivity between the VMs. 
      Please refer to Troubleshooting NSX using Packet Captures to continue investigating the intermittent traffic issues.
    11. If within the packet capture output its seen that the TEP IP/MAC changes occur this can indicate an issue with the TEP/MAC table experiencing issues. Below is an example with the following data applied:
      • Packet capture on the source ESXi host 
      • Source VM IP 169.254.10.12 and MAC 00:00:5E:00:5E:00 
      • Destination VM IP 169.254.10.13 and MAC 00:00:5E:00:53:FF 
      • Source TEP IP 199.19.250.205 and MAC 00:50:56:00:00:01
      • Destination TEP IP 199.19.250.206 and MAC 00:50:56:00:00:02
      • Different destination TEP IP 199.19.250.207 and MAC 00:50:56:00:00:03 
      • The green boxes show the expected flow of traffic, where the source hosts TEP is sending data for the source VM to the destination hosts TEP and then to the destination VM. But the red boxes show that the destination TEP IP and MAC addresses have changed. This is indicative that another hosts TEP/MAC table is retaining stale data as to where the destination VM is actually located. 
      • Identify which ESXi host is the incorrect host by the IP present in the packet capture output and verify which TEPs belong to that ESXi host by logging into the NSX Manager UI with admin credentials and going to System > Fabric > Hosts. Expand the cluster with the hosts and review the column TEP IP Addresses. 

Once its confirmed that there is a TEP/MAC table issue the following process can be performed to resolve the issue:

  1. Restart the opsagent, proxy, and cfgagent on the effected hosts by running the following commands
    • /etc/init.d/nsx-opsagent restart
    • /etc/init.d/nsx-proxy restart
    • /etc/init.d/nsx-cfgagent restart
  2. Perform the ping and packet capture tests again to confirm if the TEP IP/MAC addresses changes
    • If the TEP IP/MAC no longer changes and there are no more intermittent connectivity issues, the issue has been resolved. 
    • If the TEP IP/MAC no longer changes, but there are still intermittent connectivity issues please open a Broadcom Support case. Please review the section below (Support Case) to ensure all data and information is gathered prior to open the ticket. 
    • If the TEP IP/MAC is still changing perform a reboot of all the NSX Managers (one at a time to avoid disconnections). If this still does not resolve the issue, please continue below with opening a Broadcom Support case. 

Support Case

To open a support case with Broadcom Support please refer to Creating and managing Broadcom support cases.

Please be sure to provide the following data:

  1. Time that the testing occurred (for example June 1st 2025 at 1:30PM Eastern Time).
  2. Reference this KB in the case description.
  3. VM names and IP/MAC addresses.
  4. ESXi host names (please identify which host is the source, destination, and incorrect host).
  5. TEP IP/MAC addresses.
  6. Screenshots of the outputs of the ping tests between the VMs and the packet capture outputs.
  7. Upload NSX Manager logs and all three ESXi host logs.