Installing or Upgrading NSX on an ESXi host fails reporting the node already exists
search cancel

Installing or Upgrading NSX on an ESXi host fails reporting the node already exists

book

Article ID: 319975

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Install Failure:
    • While preparing an ESXi host as a Transport Node, either of the following two error's are seen
Node <node-name> with same ip <###.###.###.###> already exists
OR
Error: General error has occurred. Discovered node with id:<Discovered Node ID:host-###> is already prepared having fabric node id:<Transport Node UUID>.
  • Upgrade Failure:
    • During host upgrade the following error is seen:
Failed to get Host status for upgrade unit <Upgrade Unit ID> due to error Transport node <Transport Node UUID> not found
  • An ESXi host was removed from vCenter without first removing it from NSX.

  • The Host is not visible under the Host and Cluster section of the UI: System - Fabric - Hosts - Clusters or Standalone.

  • Reinstalling the ESXi OS, does not resolve the issue, as the IP or name already exists in NSX.
  • An upgrade pre-check may fail and can cause the upgrade process to pause.

  • Running the GET API '/api/v1/transport-nodes/########-25d7-4ff4-ba26-############/state' reveals the following results:

      "node_deployment_state" : {
        "state" : "failed",

        "details" : [ {
          "sub_system_id" : "########-25d7-4ff4-ba26-############",
          "state" : "failed",
          "failure_message" : "Failed to uninstall the software on host. Host OS version not found.\n",
          "failure_code" : 26020
        } ]
      },
      "deployment_progress_state" : {
        "progress" : 40,
        "current_step_title" : "Removing NSX bits"

Note: A transport node is a host prepared with NSX VIB's.

Environment

VMware NSX
VMware NSX-T Data Center

Cause

When an ESXi host is removed directly from vCenter without first removing NSX-T, it can result in entries for that host remaining in the NSX-T database.

The correct procedure to remove NSX-T from a single host:

Uninstall NSX-T Data Center from a Managed Host in a vSphere Cluster

If the cluster is a security only cluster and is prepared using vLCM or has service insertion installed, it is not possible to detach the transport node profile.
In such circumstances, to uninstall NSX-T, either remove NSX-T from the whole cluster or move the single host out of the NSX-T prepared cluster to the datacenter level in vCenter.
Once the host is no longer part of an NSX-T prepared cluster NSX-T can be removed using the NSX-T GUI.

If an ESXi host/Transport node, has been removed from vCenter inventory due to a catastrophic OS failure and needs reinstallation, skip to Option 3, in the workaround, in the resolution section.

Resolution

This is a known issue impacting VMware NSX.


Workaround:

Environments may have reached this state by following different steps, therefore, there are a number of possible workaround options that need to be tried.

If the issue is encountered during an Upgrade, follow Option 1 to 3 in order until one resolves the issue.

 

Install Failure Prerequisite

Before proceeding with the Option's below, move the Host that has failed to install/upgrade out of the vSphere cluster and make it a standalone host in vSphere.

Then proceed to go through the below Option's in order from 1 to 3.

 

Upgrade Failure Prerequisite

No action required, proceed through the below Option's in order 1 to 3.

 

Option 1

  1. In the NSX UI, check if you can find the impacted Host on the following pages:

     System > Fabric > Hosts > Cluster

     System > Fabric > Hosts > Other hosts

     System > Fabric > Hosts > Standalone

  2. If the ESXi host is present here, select it and click Delete NSX and select Force Delete.

  3. Once the force delete is complete, the ESXi host can now be re-added to vCenter.

  4. If the Host is not present, please proceed to Option 2. If you have already completed Option 2 and and retrying Option 1 after reindexing and the issue is still not resolved, please proceed to Option 3.

 

Option 2

In some cases, the host may not appear in the NSX-T UI due to search indexing failure's.

  1. On all three NSX-T manager nodes, log in as the admin user and run the following two commands:

    start search resync policy

    start search resync manager

  2. After you run the above commands, please allow some time for the reindexing to complete, this depends on the size of the environment, please allow at least 10 minutes.
    Note: During the period of reindexing, you may notice the NSX-T UI will display and error in relation to the indexing and indicates to try again later, this is expected, due to the indexing occurring.

  3. Once the reindexing is complete, after at least 10 minutes, go back to Option 1 and follow the steps there again.

 

Option 3

If you have completed Option 1 and 2 and the host still does not appear on the NSX-T UI, to allow removal, then the below API steps can be used to remove the transport node.

  1. Run the following API call:
    "GET https://<NSX Mgr IP>/api/v1/transport-nodes/<UUID>/state" command. 

    Note: Replace <UUID> with the Transport Node UUID, as reported in the error message (see Issue/Introduction section).
    Replace <NSX Mgr IP> with the IP address or FQDN of an NSX-T manager node.

  2. If the state value in the API response is not Object Not found then proceed to step 3.
    Note: The state value should be object not found when the host is successfully removed.

  3. For NSX-T 3.2.x and 4.x, run the following API call:
    "DELETE https://<NSX Mgr IP>/api/v1/transport-nodes/<UUID>?force=true&unprepare_host=false".

    Note: Replace <UUID> with the Transport Node UUID, as reported in the error message (see Issue/Introduction section).
    Replace <NSX Mgr IP> with the IP address or FQDN of an NSX-T manager node.
  4. Wait for 5 minutes, and then run the GET transport node state command again, as seen in Step 1 periodically until  "Object Not found" is returned.

  5. Once GET API returns "Object Not found", move the Host back into the original cluster to prepare it for NSX. If a Transport Node Profile is applied, Host preparation should start automatically otherwise, proceed and prepare the transport node as before.

If none of the Option's has resolved the issue, please collect the information outlined in the Additional Information section below and open a technical support case with Broadcom Support for further investigation and refer to this KB article.

For more information, see Creating and managing Broadcom support cases.

Additional Information

If you are contacting Broadcom support about this issue, in order to aid a timely response and resolution, please provide the following:

  • NSX-T version.
  • Was the issue encountered during an Upgrade or install.
  • Where all workaround Option's completed and if not, which Option's where not completed and reason why they where not completed or what issue prevented completion of them.
  • NSX Manager log bundles.
  • ESXi host log bundles for hosts that are failing to configure as transport nodes.
  • Text of any error messages seen in NSX GUI or command lines pertinent to the investigation and screenshot.

Handling Log Bundles for offline review with Broadcom support