Transport Nodes stuck at 68% (Applying NSX switch configuration) after reboot of manager
book
Article ID: 374279
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
In NSX deployments with a large number of transport nodes, ESXi hosts that were previously showing completed status for their Configuration State can revert to showing "Applying NSX switch configuration". This can happen after rebooting NSX Manager nodes or restarting services on Manager nodes. The progress becomes stuck at 68% and never completes.
As the failing step is a re-application of the existing config, no changes are actually being made to VMs running on the impacted ESXi hosts. There should be no data plane impact if the host was previously configured successfully. However, any other services or products that rely on the host state may be impacted as the host state will appear as applying/non-complete to them. In these situations, there may appear messages implying that the Transport Node (TN) config is in-progress.
This issue can be confirmed by checking the management logs under /var/log/proton/nsxapi*. The following string will be shown for the relevant TN ID followed by no further logging of that TN ID. "TransportNodeStateAutoRectifier: Syncing TransportNode TransportNode/<transport_node-id> [Current config status = FAILED, Failure code = 8804]"
Environment
VMware NSX 4.1.x VMware NSX-T Data Center 3.x
Cause
Restarting NSX Manager services or rebooting Manager nodes can lead to a missed AppInit. This prevents the Auto Rectifier service from detecting and resolving the failed transport nodes, leaving the node status in an "Applying..." state as depicted in the GUI.