Host commission from SDDC Manager fails at "Acquire SDDC Manager Host(s) locks"
search cancel

Host commission from SDDC Manager fails at "Acquire SDDC Manager Host(s) locks"

book

Article ID: 375250

calendar_today

Updated On:

Products

VMware SDDC Manager VMware Cloud Foundation 5.x

Issue/Introduction

  • Host commission from SDDC Manager fails at 
    Acquire SDDC Manager Host(s) locks
  • Error in logs - 
    Failed to get clusters information from resource aggregator

/var/log/vmware/vcf/domainmanager/domainmanager.log

yyyy-mm-ddThh:mm:ss WARN  [vcf_dm,xxxxxxxxxxx,07fe] [c.v.v.c.s.i.ClusterManagerIsServiceImpl,http-nio-127.0.0.1-7200-exec-27]  Failed to get clusters information from resource aggregator. Cluster capacity information (CPU, Storage, Memory) won't be provided in the response.

yyyy-mm-ddThh:mm:ss ERROR [vcf_dm,xxxxxxxxxx,8f4c] [c.v.v.r.c.c.ResourceAggregatorServiceImpl,http-nio-127.0.0.1-7200-exec-43]  Failed to get clusters information from resource aggregator.
com.vmware.cloud.foundation.rest.operationsmanager.internal.runtime.ApiException: Bad Gateway

 

/var/log/vmware/vcf/commonsvc/vcf-commonsvcs.log

yyyy-mm-ddThh:mm:ss [common,xxxxxxxxxxxxx,58cf] [c.v.e.s.l.s.impl.LockingServiceImpl,http-nio-127.0.0.1-7100-exec-311] No topology path found with type HOST, id <host_id>, and name <ESXi host FQDN>

yyyy-mm-ddThh:mm:ss ERROR [common,xxxxxxxxxxxxx,58cf] [c.v.e.s.e.h.LocalizableRuntimeExceptionHandler,http-nio-127.0.0.1-7100-exec-311] [2BQQQO] INVALID_RESOURCE Resource with type HOST, and ID <host_id> or Name <ESXI FQDN> is not found.
com.vmware.evo.sddc.common.core.error.InvalidInputException: Resource with type HOST, and ID <host_id> or Name <ESXi_FQDN> is not found.

 

Environment

VMware Cloud Foundation 5.x

Cause

  • Manual DB clean-up on the environment and some inventory resources were deleted. Therefore we see errors like missing cluster-domain association in the tables
  • Unable to find one the host in the topology

Resolution

  1. Check if there are any stale resource locks held in SDDC Manager and release it
    1. Check for stale resource locks
      1. SSH to SDDC Manager using vcf and su to root
      2. Run the below command to find the resource locks
        curl localhost/resource-locks | jq

        Sample output

        vcf@sddc-manager [ ~ ]$  curl localhost/resource-locks | jq
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed
        100   741    0   741    0     0   172k      0 --:--:-- --:--:-- --:--:--  180k
        {
          "elements": [
            {
              "id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx",
              "resourceType": "cluster",
              "resourceId": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa",
              "resourceName": "vi-cluster1",
              "operationId": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb",
              "serviceId": "cccccccc-cccc-cccc-cccc-cccccccccccc"
            },
            {
              "id": "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy",
              "resourceType": "system",
              "resourceId": "SYSTEM",
              "resourceName": "SYSTEM",
              "operationId": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb",
              "serviceId": "cccccccc-cccc-cccc-cccc-cccccccccccc"
            },
            {
              "id": "42b2dcfb-a15d-4647-acc6-51f7e04d2a78",
              "resourceType": "domain",
              "resourceId": "aea9c2e4-8ce2-4b96-89aa-fa82c57f31f1",
              "resourceName": "nsxt-vi",
              "operationId": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb",
              "serviceId": "cccccccc-cccc-cccc-cccc-cccccccccccc"
            }
          ]
        }
    2. check if the operationId matches the workflow id of the failed workflow domain workflow
    3. Delete the locks held by the operationid
      curl --location --request DELETE 'localhost/resource-locks' \
      --header 'Content-Type: application/json' \
      --data '{
        "operationId": "<ID gathered from Step 1(a-ii) >",
        "serviceId": "cccccccc-cccc-cccc-cccc-cccccccccccc"
      }'
  2. If there are no stale resource locks then restart commonsvcs service and retry the failed operation
    systemctl restart commonsvcs