The issue is addressed in versions NSX 4.1.1 and 3.2.3.
Workaround:
Attempt Edge Deletion:
api/v1/transport-nodes/<tn-id>
API.Monitor Deletion Progress:
/state
API output./state
API will transit from “pending” -> “in_progress” -> “failed” -> “orphaned”.https://<manager-ip>/api/v1/transport-nodes/<tn-id>/state
API.Identify Stuck Deletion:
{ "details": [ { "failure_code": 8804, "failure_message": " " Host configuration: Failed to send the HostConfig message. [TN=TransportNode/<#Edge_UUID#>]. Reason: Failed to send HostConfig RPC to MPA node:<#Edge_UUID#>. Error: Unable to reach client <#Edge_UUID#>, application SwitchingVertical.", "state": "orphaned", "sub_system_id": "", "sub_system_type": "Host" } ], { "failure_code": 8804, "failure_message": "Host configuration failed. Number of retries: 1298. Next retry attempt will be between [DATE-TIME] and [DATE-TIME] (UTC).", "maintenance_mode_state": "DISABLED", "node_deployment_state": { "state": "DELETE_IN_PROGRESS" }, "state": "orphaned", "transport_node_id": "<#Edge_UUID#>" }
5. If the deletion process is stuck due to network disruptions between the NSX Manager and the edge VM, manual intervention is needed and follow these steps:
{ "httpStatus": "BAD_REQUEST", "error_code": 16077, "module_name": "FABRIC", "error_message": "[Fabric] Refresh <#Edge_UUID#> placement references failed." }
clean_stale_entries
) API doesn’t remove all stale entries, retry steps 4 & 5.Note: This workaround is suitable for NSX-T releases version 3.2.1 and later.
If the Edge VM is still existing in NSX UI inventory after following the workaround, restart nsx-proxy service on the host which was hosting the Edge VM/etc/init.d/nsx-proxy status | restart
Enhancements in Versions 3.2.1.1 and 4.0.0.1
This KB article addresses challenges related to deleting powered-off, orphaned, or disconnected edge VMs, specifically those auto-deployed through NSX Manager.
Previously, as described in the "Symptoms" section, the process did not delete the edge if it was unreachable, due to concerns over potential duplicate VTEP issues. With the latest enhancements, the updated behavior is as follows:
1. Standard Deletion Workflow: If the edge is reachable and the host switch config is cleared from the edge, it's safe to delete the edge VM from NSX and VC.
2. However, if issues arise during the first step, primarily caused by connectivity problems between the edge and the manager, we examine the NSX inventory for the edge VM's presence:
a. If the Edge VM is in the NSX Inventory:
b. If the Edge VM isn't in the NSX Inventory: This might be due to NSX inventory discrepancies or the VM's deletion from VC. Users should refer to the workaround to delete the edge from NSX.
Important Note
Should a network disconnection occur between the manager and edges, and if the host switch configuration along with the edge gets deleted from the manager, VTEP resources will be freed. The subsequently released IP might be allocated to a new edge from the IP pool. Such actions can produce duplicate edge IP addresses, creating serious datapath disruptions.
To avoid such scenarios, NSX Manager attempts to establish connectivity with the edge/VC prior to the edge VM's deletion. It's imperative to understand that if the manager can't access the Edge or VC, it can't deduce that the edge has been deleted, warranting user intervention.
Further, with these improvements, even if the edge turns unreachable for NSX Manager, it remains trackable through VC and NSX inventory due to its auto-deployment on VC. This facilitates edge identification and cleanup from VC when needed.
Impact/Risks:
Leaving behind stale EDGE VM entries in the VC Inventory can disrupt the datapath.