VMware NSX 3.2.3 and previous versions
This issue is caused by a bug in the Central Control Plane (CCP) Pigeon batching mechanism, which results in potential data loss between the CCP and NestDB on the Edge node. If publishing a batched message fails, the current message cursor is not rolled back. Consequently, subsequent update or delete requests—such as removing a stale MAC address—are missed and not copied to the cache. This leaves duplicate or stale ARP records in the Edge's NestDB, causing traffic to be routed incorrectly.
This issue is resolved in VMware NSX versions 3.2.4.0 and 4.1.1.0.
Workaround:
To temporarily restore stability and clear the stale NestDB entries, restart the tn-proxy (nsx-proxy) service on the affected Edge Node.
#/etc/init.d/nsx-proxy restart