SDDC Manager workflows to Configure DNS failed due to NSX Edge Nodes Stuck in "EXPANDING" State
search cancel

SDDC Manager workflows to Configure DNS failed due to NSX Edge Nodes Stuck in "EXPANDING" State

book

Article ID: 436461

calendar_today

Updated On:

Products

VMware SDDC Manager / VCF Installer

Issue/Introduction

  • SDDC Manager workflows to "Configure DNS/NTP" fail with the error: ENTITY_VALIDATION_FAILED.
  • The vcf-operational-manager.log contains errors similar to: ERROR [vcf_om,...] [c.v.v.s.a.ValidateEntitiesInSystemAction] Validation errors [EntityValidationResponse(name=nsxedgecl, ..., type=NSXT_EDGE_CLUSTER, errorCode=ENTITY_NOT_ACTIVE, entityValidationStatus=ENTITY_VALIDATION_FAILED, entityValidationErrorMessages=[Status of <nsxedgecl>is EXPANDING.], ...)]
  • In the SDDC Manager UI, the NSX Edge Cluster status remains stuck in an "EXPANDING" state.

Environment

VCF 5.x

 

Cause

This issue occurs because manual changes to DNS or NTP settings outside of the SDDC Manager orchestrator can lead to a loss of connectivity or failed name resolution for management components.

The subsequent failure to update these settings globally via SDDC Manager is often due to an inconsistent entity state. If an NSX Edge Cluster is perceived by the SDDC Manager database to be in an intermediate state (e.g., EXPANDING), the global validation logic for the DNS/NTP update workflow will fail, as it requires all entities to be in an ACTIVE state.

Resolution

Step 1: Delete the Failed expansion task from domain manager db. 

  • Take a snapshot of the SDDC Manager appliance.
  • SSH into the SDDC Manager appliance as vcf and switch to root.
  • Access the domainmanager.
  • psql -h localhost -U postgres -d domainmanager;
  • select id,name FROM execution WHERE execution_state='COMPLETED_WITH_FAILURE';

         Delete the failed task workflow with task name Expansion workflow.

          DELETE FROM execution WHERE id='<INSERT_TASK_ID>';

Step 2: Change the nsx cluster state to 'ACTIVE'

  • Access the operationsmanager database: psql -h localhost -U postgres -d platform
  • Identify the affected NSX Edge cluster ID: SELECT id, name, status FROM nsxt_edge_cluster;
  • Update the status to ACTIVE for the cluster stuck in EXPANDING: UPDATE nsxt_edge_cluster SET status='ACTIVE' WHERE id='<cluster_id>';
  • Exit the database: \q
  • Finalize DNS/NTP Update via SDDC Manager
  • Navigate to the SDDC Manager UI.
  • Retry the Configure DNS/NTP task.
  • The task should now bypass the validation error and successfully push the new settings to all remaining entities (ESXi hosts, NSX Managers, etc.).

Restart the domain manager service and operation manager service and retry the workflow.