Transport Nodes stuck at 68% (Applying NSX switch configuration) after reboot of manager
book
Article ID: 374279
calendar_today
Updated On:
Products
VMware NSX Networking
Issue/Introduction
In NSX deployments with over 100 transport nodes, post reboot of an NSX Manager node previously prepared transport nodes revert to the "Applying NSX switch configuration" status, where they become stuck at 68% progress and never progress to completion.
As the failing step is a re-application of the existing config, no changes are actually being made the VMs on the impacted hosts will have no data-plane impact if the host was previously successfully configured. However any other services or products that rely on the host state may be impacted as the host state will appear as applying/non-complete to them and they may convey a message implying that the TN config is in progress.
This can be confirmed by checking the management logs /var/log/proton/nsxapi*. The following string will be shown for the relevant TN ID and then no further logging of that TN ID. "TransportNodeStateAutoRectifier: Syncing TransportNode TransportNode/<transportnode-id> [Current config status = FAILED, Failure code = 8804]"
Environment
VMware NSX 4.1.1, 4.1.2 VMware NSX-T Data Center 3.2.X
Cause
This issue can occur during a Proton service restart on one of the NSX Manager nodes, this can be caused by a reboot of the node or a service restart. The restart can lead to a missed case in AppInit prevents the Auto Rectifier from detecting and resolving the failed transport nodes. This leaves the node status in an applying state in the GUI.
Resolution
Fixed in NSX 4.2.0 and later.
To workaround the issue the following API can be run, specifying each impacted transport node to reset the status of the node if this issue occurs :"POST <NSXT-FQDN>/api/v1/transport-nodes/<transportnode-id>?action=resync_host_config"