Intermittent connectivity issues between NSX manager and supervisor VMs after supervisor cluster upgrade
search cancel

Intermittent connectivity issues between NSX manager and supervisor VMs after supervisor cluster upgrade

book

Article ID: 431481

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Ping from the Supervisor VM fails to reach NSX Managers and vCenter intermittently.
  • When initiating a ping from the NSX Manager, the Edge VM routes the packet to the Supervisor VM using a wrong or stale destination MAC address.
  • The packet is subsequently dropped on the destination hypervisor because the MAC address does not exist.

Environment

VMware NSX 3.2.3 and previous versions

Cause

This issue is caused by a bug in the Central Control Plane (CCP) Pigeon batching mechanism, which results in potential data loss between the CCP and NestDB on the Edge node. If publishing a batched message fails, the current message cursor is not rolled back. Consequently, subsequent update or delete requests—such as removing a stale MAC address—are missed and not copied to the cache. This leaves duplicate or stale ARP records in the Edge's NestDB, causing traffic to be routed incorrectly.

Resolution

This issue is resolved in VMware NSX versions 3.2.4.0 and 4.1.1.0.

Workaround:

To temporarily restore stability and clear the stale NestDB entries, restart the tn-proxy (nsx-proxy) service on the affected Edge Node.

#/etc/init.d/nsx-proxy restart