NSX Edge Nodes Stuck in "EXPANDING" State in SDDC Manager
search cancel

NSX Edge Nodes Stuck in "EXPANDING" State in SDDC Manager

book

Article ID: 403743

calendar_today

Updated On:

Products

VMware SDDC Manager

Issue/Introduction

In VMware Cloud Foundation environments managed by SDDC Manager, NSX Edge cluster expansion may get stuck in the “EXPANDING” state when static TEP IPs are used and the required NSX-defined TEP IP Pool is not configured. This results in skipped validation checks and prevents the expansion process from completing or rolling back.

This article documents the scenario where certain NSX Edge nodes (specifically edge-node-5 and edge-node-6) remained in the EXPANDING state due to misconfiguration, and provides the procedure used to clean up the database and restore the NSX Edge cluster to a healthy state.
edge-node-1, edge-node-2, edge-node-3, edge-node-4 were also stuck in EXPANDING state, ask was to only remove edge-node-5 and edge-node-6

Environment

SDDC Manager 5.x

Cause

The NSX Edge cluster expansion process got stuck in the “EXPANDING” state because the required NSX-defined TEP IP Pool (edgeTepIpPoolDefined=false) was not configured. Instead, static TEP IPs were used (edgeTepIp1Defined=true), which bypassed the necessary validation checks (EDGE_TEP_NEW_IP_POOL_*) in SDDC Manager.

As a result:

  • Validation logic was skipped.

  • The expansion workflow could not complete.

  • No automatic rollback or timeout occurred, causing the edge nodes (edge-node-5 and edge-node-6) to remain in an incomplete state.

Resolution

The following steps can be taken to resolve the issue:

  1. Snapshot Taken: A snapshot of the SDDC Manager appliance was taken before making any changes to ensure rollback capability.

  2. PostgreSQL Database Cleanup:

    • The mapping between the affected vCenter cluster and the NSX Edge cluster was removed.

    • The nsxt_edge_cluster table was updated to:

      • Remove the stuck edge nodes (edge-node-5, edge-node-6)

      • Set the NSX Edge cluster status to ACTIVE

  3. Database Commands Executed:


    1) -- Remove mapping between vCenter cluster and NSX Edge cluster

    DELETE from cluster_and_nsxt_edge_cluster WHERE id='6147';

    2) -- Update edge cluster status and edge nodes list

    UPDATE nsxt_edge_cluster SET nsxt_edge_nodes='[{"vmManagementIpAddress":"mgmt-ip","vmHostname":"host-name","sourceId":"source-id","id":"id"}, {"vmManagementIpAddress":"mgmt-ip","vmHostname":"host-id","sourceId":"source-id","id":"id"}, {"vmManagementIpAddress":"ip-address","vmHostname":"host-name","sourceId":"source-id","id":"id}, {"vmManagementIpAddress":"mgmt-ip","vmHostname":"host-name","sourceId":"source-id","id":"id"}]', status='ACTIVE' WHERE id = 'id'; 

 
  1. Health Validation:

    • After the database updates, a health check confirmed the NSX Edge cluster was functional.

    • The expansion state was cleared, and the remaining edge nodes were operating as expected.