VCF 9 Deployment fails at 'Retrieve the status for VCF Operations with VCF Operations collector Deployment request'
search cancel

 VCF 9 Deployment fails at 'Retrieve the status for VCF Operations with VCF Operations collector Deployment request'

book

Article ID: 426652

calendar_today

Updated On:

Products

VMware SDDC Manager

Issue/Introduction

When deploying an additional VCF instance to join an existing fleet, you see the following error:

Request scaleout failed with error cause [{"messageId":"LCMVROPSYSTEM25127","message":"Error while configuring the adapter.","eventId":"##########","retry":true,"exceptionMessage":"Some SDDC Manager hosts are not configured. Failed SDDC Manager hosts are: [#########]","exceptionStackTrace":"com.vmware.vrealize.lcm.plugin.common.vrops.exceptions.AdapterConfigurationException: Some SDDC Manager hosts are not configured. Failed SDDC Manager hosts are: [########]\n\tat com.vmware.vrealize.lcm.plugin.core.vrops.tasks.VcfAdapterConfigurationTask.execute(VcfAdapterConfigurationTask.java:162)\n\tat com.vmware.vrealize.lcm.plugin.core.vrops.tasks.VcfAdapterConfigurationTask.retry(VcfAdapterConfigurationTask.java:210)\n\tat com.vmware.vrealize.lcm.automata.core.TaskThread.run(TaskThread.java:60)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)\n\tat java.base/java.lang.Thread.run(Unknown Source)\n","localizedMessageId":null,"parameters":null,"properties":{"skipTask":"false"}}] Reference Token:######

vcf-collector logs complain about service account being created multiple times and then errors out:

YYYY-MM-DDTHH:MM:SS.118Z INFO  collector 9015 [threadId="73" threadName="TasksManager-TaskHandler-4"] - Creating service account: SvcAccountInfo(client=ops, server=vc, serverFqdn=#########, accountName=svc-ops-vc-#########-417e-8528-5f5f02e4681e-b276450bca6949e0-1, ...)

YYYY-MM-DDTHH:MM:SS.169Z INFO  collector 9015 [threadId="70" threadName="TasksManager-TaskHandler-1"] - Creating service account: SvcAccountInfo(client=ops, server=vc, serverFqdn=#########, accountName=svc-ops-vc-#########-918b-417e-8528-5f5f02e4681e-b276450bca6949e0-1, ...)

YYYY-MM-DDTHH:MM:SS.408Z ERROR collector 9015 [threadId="70"] - Exception occurred while creating service account. Error code: 409 com.vmware.vrops.adapter.vcf.exception.VcfException: The requested resource already exists

It mentions that this already exists on the vCenter:

YYYY-MM-DDTHH:MM:SS.412Z INFO  collector 9015 [threadId="70"] - The account svc-ops-vc-#########-417e-8528-5f5f02e4681e-b276450bca6949e0-1 already exists on the server #########

It attempts to delete it:

YYYY-MM-DDTHH:MM:SS.413Z INFO  collector 9015 [threadId="70"] - Start deleting service account svc-ops-vc-#########-918b-417e-8528-5f5f02e4681e-b276450bca6949e0-1 on the server #########

It sends another request to create it again and delete it and finally errors out with a 500 error code:

YYYY-MM-DDTHH:MM:SS.686Z INFO  collector 9015 [threadId="73"] - Successfully created service account: SvcAccountInfo(client=ops, server=vc, serverFqdn=#########, accountName=svc-ops-vc-#########918b-417e-8528-5f5f02e4681e-b276450bca6949e0-1, ...)

YYYY-MM-DDTHH:MM:SS.787Z INFO  collector 9015 [threadId="70"] - Successfully deleted service account svc-ops-vc-#########-918b-417e-8528-5f5f02e4681e-b276450bca6949e0-1 on the server #########

YYYY-MM-DDTHH:MM:SS.787Z INFO  collector 9015 [threadId="70"] - Start creating service account svc-ops-vc-#########-918b-417e-8528-5f5f02e4681e-b276450bca6949e0-1 on the server #########

YYYY-MM-DDTHH:MM:SS.652Z ERROR collector 9015 [threadId="70"] - Exception occurred while creating service account. Error code: 500
com.vmware.vrops.adapter.vcf.exception.VcfException: API returned with unexpected status code 500

To validate whether this service account exists on vCenter, we can run the following command on vCenter:

 /usr/lib/vmware-vmafd/bin/dir-cli svcaccount list

Environment

VCF 9.0.1

Cause

The root cause is a race condition facilitated by the lack of mutual exclusion (locking) for parallel operations on the same resource. When multiple requests for the same VCF adapter are processed simultaneously, the system's aggressive error-recovery logic—which deletes existing accounts upon encountering a 409 Conflict—results in the accidental deletion of valid credentials created by competing threads. 

 

Resolution

  • To resolve the current state, the following actions are recommended:

    • Connect to the Aria Operations analytics node and executed a sequence of database commands to identify and remove the conflicting credential entry from the internal Postgres database.

      1. Verify the Conflicting Credential

      This command was used to confirm the existence and details of the specific credential ID:

      su - postgres -c "/opt/vmware/vpostgres/current/bin/psql -d vcopsdb -p 5433 -c \"SELECT credential_id, credential_name, adapter_key, credential_kind_key, fields FROM credential WHERE credential_id = 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee';\"" > /tmp/verify_test_credential.txt
      

      2. Check for Active References

      We checked if any existing adapters were currently using this credential to avoid breaking existing monitoring connections:

      su - postgres -c "/opt/vmware/vpostgres/current/bin/psql -d vcopsdb -p 5433 -c \"SELECT r.resource_id AS adapter_id, r.resource_name AS adapter_name, r.adapter_kind, c.credential_id, c.credential_name FROM resource r JOIN adapter a ON a.adapter_id = r.resource_id JOIN credential c ON c.credential_id = a.credential_id WHERE c.credential_id = 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee';\"" > /tmp/check_credential_references.txt
      

      3. Remove the Conflicting Entry

      Once confirmed as an orphan/manual conflict, we deleted the credential to allow the VCF deployment to retry:

      su - postgres -c "/opt/vmware/vpostgres/current/bin/psql -d vcopsdb -p 5433 -c \"DELETE FROM credential WHERE credential_id = 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee' AND NOT EXISTS (SELECT 1 FROM adapter WHERE credential_id = 'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee');\"" > /tmp/remove_cred_by_id.txt