Failures when trying to delete powered-off, orphaned, or disconnected edge VMs, specifically those auto-deployed via NSX Manager.
NSX edge node might get deleted from the NSX UI but its stale entry remains in the vCenter inventory.
The vTEP assignments (IP addresses) associated with an edge are released from the NSX side once it's deleted. These IPs might be assigned to new edges, causing duplicate IPs and potential disruptions.
An edge node is marked as orphaned.
VMware NSX
VMware NSX-T Data Center
Trying to delete Edge from NSX-T managers which are deleted from VC/ Powered off/ orphaned, or disconnected, specifically those auto-deployed via NSX Manager/ via OVF from VC.
Host/ datastore on which edge is deployed is now corrupt/crashed/ unresponsive.
This can be a result of incorrectly deleting the Edge TN(using corfu command, etc.)
Initiating a delete operation on the NSX Edge node from the NSX UI causes the NSX manager to contact the edge directly for deletion.
If the NSX manager fails to contact the Edge node, it tries to ask the vCenter for a delete operation (applies when the edge node was deployed via the NSX UI and not OVA).
If both steps fail, the EDGE gets stuck in the "Delete in progress" state.
Make sure the target edge to be deleted is not consumed in any edge cluster. If the edge is in use, please remove it from the edge cluster, using the steps mentioned in the document. Below mentioned are the generic steps for deleting an edge VM/BM.
NSX-T versions 3.2.x - 4.2:
Attempt Edge Deletion:
UI:
You can go to System > Fabric > Nodes > Select the target edge > Perform the Delete action
API:
Use the DELETE https://<manager-ip>/api/v1/transport-nodes/<tn-id> API. This can be run using curl or using tools like Postman please see NSX API guide.
If you've performed deletion of Edge from NSX-T manager(API/UI) but the edge state in UI shows it is stuck for a long time(>15 mins) in Deletion in progress state, either the Edge VM is unreachable or deleted from VC. In order to confirm via API use GET api/v1/transport-nodes/<tn-id>/state. Refer "node_deployment_state" is set to "DELETE_IN_PROGRESS" and "state" is set to "orphaned"
In such a case, please make sure the edge VM is deleted from the vCenter, and fire following API-
POST https://<manager-ip>/api/v1/transport-nodes?action=clean_stale_entries This API will clean up all the stale edges from the NSX-T manager. This can be run using curl or using tools like Postman please see NSX API guide.
NSX-T versions > 4.2.0:
Attempt Edge Deletion
DELETE https://<manager-ip>/api/v1/transport-nodes/<tn-id> API. This can be run using curl or using tools like Postman please see NSX API guide.If you've performed deletion of Edge from NSX-T manager(API/UI) but the edge state in UI shows it is stuck for a long time(>15 mins) in Delete Failed state, either the Edge VM/BM is unreachable or deleted from VC.
Make sure you've removed the edge VM from vCenter or used "del nsx" nsxcli command for BME.
Using UI
Upon clicking Delete Failed state on UI, you'll see the below pop-up
Then Go ahead and remove edge by clicking Done, Remove from NSX.
Using API:
In order to cleanup a specific stale edge VM use the below API (from root on one of the NSX Managers):
curl -l -k -u 'admin' -H 'Content-Type:application/json' -X POST https://<manager-ip>/api/v1/transport-nodes/<tn-id>/action/clean-stale-entries
NSX-T versions >= 9.0:
In cases where, you've already attempted edge deletion from NSX-T manager but the deletion is stuck for a long time, the edge VM is deleted from vCenter or the edge is unreachable you can proceed with calling the below API, to cleanup a specific stale edge VM:
DELETE https://<manager-ip>/api/v1/polilcy/api/v1/infra/sites/<site-id>/enforcement-points/<enforcementpoint-id>/edge-transport-nodes/<edge-transport-node-id>?force=trueSteps mentioned below give a generic way to follow and troubleshoot the edge deletion:
Attempt Edge Deletion
UI:
You can go to System > Fabric > Nodes(or Edges as called in newer versions) > Perform the Delete action
API:
Use the DELETE https://<manager-ip>/api/v1/polilcy/api/v1/infra/sites/<site-id>/enforcement-points/<enforcementpoint-id>/edge-transport-nodes/<edge-transport-node-id> API
This can be run using curl or using tools like Postman please see NSX API guide.
If you've performed deletion of Edge from NSX-T manager(API/UI) but the edge state in UI shows it is stuck for a long time(>15 mins) in Delete Failed state, either the Edge VM/BM is unreachable or deleted from VC. For API users,
API usage:
https://<manager-ip>/api/v1/policy/api/v1/infra/sites/<site-id>/enforcement-points/<enforcementpoint-id>/edge-transport-nodes/<edge-transport-node-id> /state API and check for the "state" is set to "orphaned"
Make sure you've removed the edge VM from vCenter or used "del nsx" nsxcli command for BME.
Using UI
Upon clicking Delete Failed state on UI, you'll see the below pop-up
Then Go ahead and remove edge by clicking Done, Remove from NSX.
Using API:
In order to cleanup a specific stale edge VM use the below API:
DELETE https://<manager-ip>/api/v1/polilcy/api/v1/infra/sites/<site-id>/enforcement-points/<enforcementpoint-id>/edge-transport-nodes/<edge-transport-node-id>?force=true
Notes:
Wait up to 5 minutes for stale entities to be wiped out, Even after waiting for some time the cleanup API doesn’t remove stale entries, retry the API
A known issue and is fixed in version 4.1.1- Sometimes, in case of BMEs you can run into this error when you call the cleanup API
{ "httpStatus": "BAD_REQUEST", "error_code": 16077, "module_name": "FABRIC", "error_message": "[Fabric] Refresh <#Edge_UUID#> placement references failed." }
If the Edge VM is still existing in NSX UI inventory after following the workaround, restart nsx-proxy service on the host which was hosting the Edge VM/etc/init.d/nsx-proxy status | restart
Enhancements in Versions 3.2.1.1 and 4.0.0.1
This KB article addresses challenges related to deleting powered-off, orphaned, or disconnected edge VMs, specifically those auto-deployed through NSX Manager.
Previously, as described in the "Symptoms" section, the process did not delete the edge if it was unreachable, due to concerns over potential duplicate VTEP issues. With the latest enhancements, the updated behavior is as follows:
1. Standard Deletion Workflow: If the edge is reachable and the host switch config is cleared from the edge, it's safe to delete the edge VM from NSX and VC.
2. However, if issues arise during the first step, primarily caused by connectivity problems between the edge and the manager, we examine the NSX inventory for the edge VM's presence:
a. If the Edge VM is in the NSX Inventory:
b. If the Edge VM isn't in the NSX Inventory: This might be due to NSX inventory discrepancies or the VM's deletion from VC. Users should refer to the workaround to delete the edge from NSX.
Important Note
Should a network disconnection occur between the manager and edges, and if the host switch configuration along with the edge gets deleted from the manager, VTEP resources will be freed. The subsequently released IP might be allocated to a new edge from the IP pool. Such actions can produce duplicate edge IP addresses, creating serious datapath disruptions.
To avoid such scenarios, NSX Manager attempts to establish connectivity with the edge/VC prior to the edge VM's deletion. It's imperative to understand that if the manager can't access the Edge or VC, it can't deduce that the edge has been deleted, warranting user intervention.
Further, with these improvements, even if the edge turns unreachable for NSX Manager, it remains trackable through VC and NSX inventory due to its auto-deployment on VC. This facilitates edge identification and cleanup from VC when needed.
Impact/Risks:
Leaving behind stale EDGE VM entries in the VC Inventory can disrupt the datapath.
If the stale edge is a bare metal appliance please create a Broadcom SR referencing this KB.