Transport Nodes stuck at 68% (Applying NSX switch configuration) after reboot of manager
search cancel

Transport Nodes stuck at 68% (Applying NSX switch configuration) after reboot of manager

book

Article ID: 374279

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • In NSX deployments with a large number of transport nodes, ESXi hosts that were previously showing completed status for their Configuration State can revert to showing "Applying NSX switch configuration". This can happen after rebooting NSX Manager nodes or restarting services on Manager nodes. The progress becomes stuck at 68% and never completes.



  • As the failing step is a re-application of the existing config, no changes are actually being made to VMs running on the impacted ESXi hosts. There should be no data plane impact if the host was previously configured successfully. However, any other services or products that rely on the host state may be impacted as the host state will appear as applying/non-complete to them. In these situations, there may appear messages implying that the Transport Node (TN) config is in-progress.

  • This issue can be confirmed by checking the management logs under /var/log/proton/nsxapi*. The following string will be shown for the relevant TN ID followed by no further logging of that TN ID.

    "TransportNodeStateAutoRectifier: Syncing TransportNode TransportNode/<transport_node-id>
    [Current config status = FAILED, Failure code = 8804]"

Environment

VMware NSX 4.1.x
VMware NSX-T Data Center 3.x

Cause

Restarting NSX Manager services or rebooting Manager nodes can lead to a missed AppInit. This prevents the Auto Rectifier service from detecting and resolving the failed transport nodes, leaving the node status in an "Applying..." state as depicted in the GUI.

Resolution

This issue is resolved in VMware NSX 4.2.0, available at Broadcom downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.


To workaround the issue, the following API can be run, specifying each impacted ESXi host transport node to reset the status of that node:


POST <NSXT-FQDN>/api/v1/transport-nodes/<transportnode-id>?action=resync_host_config