Preparing a new ESXi transport node with VMware NSX VIB's fails with error "Node <node-name> with same ip xxx.xxx.xxx.xxx already exists"

search cancel

Preparing a new ESXi transport node with VMware NSX VIB's fails with error "Node <node-name> with same ip xxx.xxx.xxx.xxx already exists"

book

Article ID: 322647

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:

You are preparing an ESXi host as a transport node and receive the ERROR:

Node <node-name> with same ip xxx.xxx.xxx.xxx already exists

Or you are upgrading and the host upgrade step fails with error:

Failed to get Host status for upgrade unit xxxxxxxx-25d7-4ff4-ba26-xxxxxxxxxxxx due to error Transport node xxxxxxxx-25d7-4ff4-ba26-xxxxxxxxxxx not found

The transport node was previously prepared for VMware NSX and was part of a vCenter cluster.
The transport node was removed from vCenter and re-imaged and the cluster was deleted.
The transport node in the error is not visible in the VMware NSX UI under System - Fabric - Hosts - Clusters or Standalone.
Running the GET API: /api/v1/transport-nodes/xxxxxxxx-25d7-4ff4-ba26-xxxxxxxxxxxx/state we see the results:

  "node_deployment_state" : {
    "state" : "failed",
    "details" : [ {
      "sub_system_id" : "xxxxxxxx-25d7-4ff4-ba26-xxxxxxxxxxx",
      "state" : "failed",
      "failure_message" : "Failed to uninstall the software on host. Host OS version not found.\n",
      "failure_code" : 26020
    } ]
  },
  "deployment_progress_state" : {
    "progress" : 40,
    "current_step_title" : "Removing NSX bits"

Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 4.x
VMware NSX-T Data Center 3.x

Cause

When a transport node is part of a cluster and is then removed, this is usually due to some issue like a storage outage and has to be reimaged, and the transport node had a Transport Node Profile (TNP) attached to it.
The removal of the transport node in VMware NSX is not completed.
Usually this transport node, would then end up as a standalone host (System - Fabric - Hosts - Standalone) and can be forcefully removed there.
However, due to an issue with the search criteria used to display the orphaned transport node, this orphaned transport node does not get displayed in Standalone and you are unable to remove the orphaned transport node.

Resolution

This issue is resolved in VMware NSX 4.1.2, available at VMware downloads.

Workaround:
To correct the search and display the failed node, run the following indexing resync commands on all 3 managers:
start search resync policy
start search resync manager

Then wait for some time, this depends on the size of the environment, allow at least 10 minutes.
Then check for the orphaned host again under:

System > Fabric > Hosts > Cluster
System > Fabric > Hosts > Other hosts
System > Fabric > Hosts > Standalone

Now you can forcefully remove the transport node by selecting REMOVE NSX and checking the FORCE DELETE option.
The following API call can also be used to force delete the transport node:

DELETE api/v1/transport-nodes/<TN UUID>?force=true&unprepare_host=false

And confirm removal using the following API call:

GET api/v1/transport-nodes/<TN UUID>/state

The above GET API call, should return status object not found when the host is successfully removed.
In the case where the upgrade was in progress, you should now be able to proceed with the upgrade.

Feedback

thumb_up Yes

thumb_down No