Intermittent ping loss for a VM as the ARP entry for the affected IP was unstable due to stale IP/MAC mapping
search cancel

Intermittent ping loss for a VM as the ARP entry for the affected IP was unstable due to stale IP/MAC mapping

book

Article ID: 405813

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The affected IP causes intermittent ping loss and ARP timeout.
  • The affected IP is unreachable over the external network in N-S direction; however might be pingable over the same NSX segment.
  • Taking packet capture at Edge for the affected VM reports multiple IP-MAC addresses for the affected IP address.

Environment

VMware NSX
VMware NSX-T Data Center

Cause

  • This issue is caused due to stale ARP record IP/MAC , which remains in the edge's nest dB. For eg - 10.#.#.10-> 00:50:56:#:#:ab
  • From ccp logs, we notice there are multiple entries for this ARP record being processed by L2App 

/var/log/cloudnet/nsx-ccp.log

2025-04-08T23:52:25.946Z INFO O#l-work##-0 L2Table 76219 - [nsx@6876 comp="nsx-controller" level="INFO" subcomp="app-l2"] Last active record ArpRecord [id=LogicalSwitchId:[id=86######-4##6-4###-8###-6f4#########, vni=12345], ip=10.#.#.10, mac=00:50:56:##:##:ab, transportNodeId=######d4-8##6-4###-9##1-43c#########, timestamp=1970-01-01T00:00Z, isTnConnected=true, isLocalRecord=true, isProxyRecord=false, isIPAMBinding=false] is removed

  • But, even after L2 App seems to have processed this removal of ARP record IP/MAC 10.#.#.10/00:50:56:##:##:ab, we still can see the host showing ADD of ARP record IP/MAC 10.x.x.10/00:50:56:##:##:ab

nsx_edge -> /var/log/syslog

2025-07-15T00:00:02.526Z ######.####.###.#### NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="lswitch" level="INFO"] add ip-mac to lswitch 86######-4##6-4###-8##4-6f4#########: 10.#.#.10/00:50:56:##:##:ab  ---------------> Wrong MAC

2025-07-15T00:00:02.526Z ######.####.###### NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="lswitch" level="INFO"] add ip-mac to lswitch 86######-4##6-4###-8##4-6f4#########: 10.#.#.10/00:50:56:##:##:bb ----------------> Correct MAC

  • The process of REMOVE of the old ARP record fails to be sent from CCP to the nest dB of the host, and hence the stale ARP entry remains in the nest dB causing the above log lines (ip/mac 10.##.##.10/00:50:56:##:##:ab and ping loss for the IP.

Resolution

There is a potential data loss between CCP->Nestdb.

The fix is available in 3.2.4 and 4.1.1 onwards.

Workaround :

Restart the Central Control Plane process on each of the unified appliances.

Run the command below from NSX manager console/SSH one by one in a cluster :

/etc/init.d/nsx-ccp restart

Alternatively, perform a rolling reboot of all NSX Managers in a cluster.

Additional Information