Transport Nodes stuck on Applying NSX switch configuration after reboot of manager
search cancel

Transport Nodes stuck on Applying NSX switch configuration after reboot of manager

book

Article ID: 374279

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Transport nodes that were previously showing completed status for their Configuration State, show "Applying NSX switch configuration" at System > Fabric > Hosts.
  • The progress becomes stuck at 60% or 68% and never completes.

  • API output shows "Applying NSX switch configuration:
    GET <NSXT-FQDN>/api/v1/transport-nodes/<tn-uuid>/state

    "maintenance_mode_state": "DISABLED",
        "node_deployment_state": {
            "state": "success",
            "details": []
        },
        "deployment_progress_state": {
            "progress": 60,
            "current_step_title": "Applying NSX switch configuration"
        },
        "state": "success",
        "details": [

  • NSX Manager nodes might have been rebooted. Verify uptime from NSX manager CLI:
    nsx-mngr> get uptime
  • If no reboot was performed, services could have been restarted manually as part of troubleshooting, or automatically if you have open alarms for Application on NSX node has crashed alarm 
  • You might see the following string for the relevant Transport Node ID followed by no further logging of that Transport Node ID in /var/log/proton/nsxapi*
    "TransportNodeStateAutoRectifier: Syncing TransportNode TransportNode/<transport_node-id> [Current config status = FAILED, Failure code = 8804]
  • As the failing step is a re-application of the existing config, no changes are actually being made to VMs running on the impacted transport nodes. There should be no data plane impact if the host was previously configured successfully. However, any other services or products that rely on the host state may be impacted as the host state will appear as applying/non-complete to them. In these situations, there may appear messages implying that the Transport Node (TN) config is in-progress.

Environment

VMware NSX 4.1.x
VMware NSX-T Data Center 3.x

Cause

Restarting NSX Manager services or rebooting Manager nodes can lead to a missed AppInit. This prevents the Auto Rectifier service from detecting and resolving the failed transport nodes, leaving the node status in an "Applying..." state as depicted in the GUI.

Resolution

This issue is resolved in VMware NSX 4.2.0, available at Broadcom downloads.
If you are having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Workaround:
The following API can be run, specifying each impacted ESX transport node to reset the status of that node:

POST <NSXT-FQDN>/api/v1/transport-nodes/<transportnode-id>?action=resync_host_config

If the NSX Configuration status is still "Applying NSX switch Configuration" after this, verify if the configuration succeeded for an affected transport node in /var/log/syslog on a manager node
<timestamp> nsx-mngr NSX 797724 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Got deployment status INSTALL_SUCCESSFUL for node ########-####-###-####-######

Perform a rolling restart of the NSX manager nodes to reset the status of the nodes. if you see INSTALL_SUCCESSFUL in syslog.

Additional Information