Unable to remove Workload Domain - SSO_RING_TOPOLOGY_FETCH_FAILED_FROM_NODES
search cancel

Unable to remove Workload Domain - SSO_RING_TOPOLOGY_FETCH_FAILED_FROM_NODES

book

Article ID: 391870

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

Attempting to remove a domain from SDDC will fail with an SSO_RING_TOPOLOGY_FETCH_FAILED_FROM_NODES error.  You may see an error in the UI related to “Validate the Single Sign-On (SSO) Ring Topology” or “Failed to fetch topology from all known nodes”

 

From the /var/log/vmware/vcf/domainmanager/domainmanager.log

XXXX-XX-XXTXX:XX:XX.XXX+0000 ERROR [vcf_dm,xxxxxxxxxxxxxxxx,ade5] [c.v.e.s.o.model.error.ErrorFactory,dm-exec-5]  [2KMHRA] SSO_RING_TOPOLOGY_FETCH_FAILED_FROM_NODES Failed to fetch topology from all known nodes: [vcenter-1, vcenter-2, vcenter-3, vcenter-4].
com.vmware.evo.sddc.orchestrator.exceptions.OrchTaskException: Failed to fetch topology from all known nodes: [vcenter-1, vcenter-2, vcenter-3, vcenter-4].

Environment

VCF 4.5.2

Cause

This is caused by stale entries within SSO for an old vCenter Server / PSC.    This can be confirmed by logging into the different vCenter Servers and running the APIs to check the current topology.

vCenter > Development Center > topology/nodes > "VCSA_EMBEDDED / VCSA_EXTERNAL / PSC_EXTERNAL"

Here we find an extnernal node detected that is not part of SDDC or the current ring topology:

curl -X GET 'https://vcenter-2/api/vcenter/topology/nodes?types=VCSA_EXTERNAL' -H 'vmware-api-session-id: XXXXXXXXXX
 
 
{
 "error_type": "INTERNAL_SERVER_ERROR",
 "messages": [
  {
   "args": [
    "com.vmware.vapi.std.errors.Error"
 ]
 "default_message": "Provider method implementation thre unexpected exception: com.vmware.vapi.std.errors.Error",
 "id": "vapi.bindings.method.impl.unexpected"
  }
  {
   "args": [
     "Failed to get client affinity information for (vcenter-5). CDC Native platform error [code: 382312514][]"
  ]
  "default_message": "Internal Server Error (Failed to get client affinity information for (vcenter-5). CDC Native platform error [code: 382312514][])",
  "id": "com.vmware.vcenter.topology.error"
  }
}

Resolution

See KB 371365 for removing the stale service registrations related to this problem node from the vCenter Server ELM group.

Script to remove stale service registrations from vCenter Server - https://knowledge.broadcom.com/external/article/371365/script-to-remove-stale-service-registrat.html