In VMware Cloud Foundation environments managed by SDDC Manager, NSX Edge cluster expansion may get stuck in the “EXPANDING” state when static TEP IPs are used and the required NSX-defined TEP IP Pool is not configured. This results in skipped validation checks and prevents the expansion process from completing or rolling back.
This article documents the scenario where certain NSX Edge nodes (specifically edge-node-5 and edge-node-6) remained in the EXPANDING state due to misconfiguration, and provides the procedure used to clean up the database and restore the NSX Edge cluster to a healthy state.
edge-node-1, edge-node-2, edge-node-3, edge-node-4 were also stuck in EXPANDING state, ask was to only remove edge-node-5 and edge-node-6
SDDC Manager 5.x
The NSX Edge cluster expansion process got stuck in the “EXPANDING” state because the required NSX-defined TEP IP Pool (edgeTepIpPoolDefined=false) was not configured. Instead, static TEP IPs were used (edgeTepIp1Defined=true), which bypassed the necessary validation checks (EDGE_TEP_NEW_IP_POOL_*) in SDDC Manager.
As a result:
Validation logic was skipped.
The expansion workflow could not complete.
No automatic rollback or timeout occurred, causing the edge nodes (edge-node-5 and edge-node-6) to remain in an incomplete state.
The following steps can be taken to resolve the issue:
Snapshot Taken: A snapshot of the SDDC Manager appliance was taken before making any changes to ensure rollback capability.
PostgreSQL Database Cleanup:
The mapping between the affected vCenter cluster and the NSX Edge cluster was removed.
The nsxt_edge_cluster table was updated to:
Remove the stuck edge nodes (edge-node-5, edge-node-6)
Set the NSX Edge cluster status to ACTIVE
Database Commands Executed:
1) -- Remove mapping between vCenter cluster and NSX Edge cluster
DELETE from cluster_and_nsxt_edge_cluster WHERE id='6147';
2) -- Update edge cluster status and edge nodes list
UPDATE nsxt_edge_cluster SET nsxt_edge_nodes='[{"vmManagementIpAddress":"mgmt-ip","vmHostname":"host-name","sourceId":"source-id","id":"id"}, {"vmManagementIpAddress":"mgmt-ip","vmHostname":"host-id","sourceId":"source-id","id":"id"}, {"vmManagementIpAddress":"ip-address","vmHostname":"host-name","sourceId":"source-id","id":"id}, {"vmManagementIpAddress":"mgmt-ip","vmHostname":"host-name","sourceId":"source-id","id":"id"}]', status='ACTIVE' WHERE id = 'id';
Health Validation:
After the database updates, a health check confirmed the NSX Edge cluster was functional.
The expansion state was cleared, and the remaining edge nodes were operating as expected.