The deployment of a vRealize product has failed in VMware Cloud Foundation and there is no option to uninstall it
search cancel

The deployment of a vRealize product has failed in VMware Cloud Foundation and there is no option to uninstall it

book

Article ID: 316929

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

Symptoms:

  • The deployment of a vRealize product has failed in VMware Cloud Foundation and there is no option to uninstall it.
  • You see messages similar to the following in the /var/log/vmware/vcf/domainmanager.log file on the SDDC Manager VM:
2018-10-18 13:56:34.067 [Executor-4] INFO [ c.v.e.s.c.v.vrlcm.service.VrlcmServiceImpl] <########-####-####-####-########8e2d> Returned request progress details status = FAILED request ID = a90df022a0a848755787eec3dbdfb
2018-10-18 13:56:34.067 [Executor-4] INFO [c.vmware.evo.sddc.vrealize.vra.DeployVraContract] <########-####-####-####-########8e2d> Persisting vRSLCM vRealize operation details in inventory.
2018-10-18 13:56:34.072 [Executor-4] ERROR [c.vmware.evo.sddc.vrealize.vra.DeployVraContract] <########-####-####-####-########8e2d> vRA deployment failed. vRSLCM Request state is FAILED.
2018-10-18 13:56:34.072 [Executor-4] ERROR [c.vmware.evo.sddc.vrealize.vra.DeployVraContract] <########-####-####-####-########8e2d> vRSLCM vRealize operation has failed. Error code is VRA_IAAS_MANAGEMENT_AGENT_INSTALLATION_FAILED.
vRSLCM exception message is VM Not Found : vrealize-01.corp.local.
vRSLCM error message is vRA Iaas Management Agent Installation Failed..
2018-10-18 13:56:34.122 [Executor-4] INFO [ c.v.e.s.c.client.vmware.vsphere.VsphereClient] <########-####-####-####-########8e2d> Successfully logged in to https://vcenter.corp.local/sdk
2018-10-18 13:56:34.123 [Executor-4] INFO [ c.v.e.s.c.client.vmware.vsphere.VcManagerBase] <########-####-####-####-########8e2d> Toggle HA feature on cluster vcfbreda-mgmt-01 to true
2018-10-18 13:56:34.148 [Executor-4] DEBUG [c.v.e.s.c.client.vmware.vsphere.InventoryService] <########-####-####-####-########8e2d> No more results to retrieve
2018-10-18 13:56:34.167 [Executor-4] INFO [ c.v.e.s.c.client.vmware.vsphere.VsphereUtils] <########-####-####-####-########8e2d> Task: (MOR:task-2771) (Name:unknown) is started
2018-10-18 13:56:36.171 [Executor-4] INFO [ c.v.e.s.c.client.vmware.vsphere.VsphereUtils] <########-####-####-####-########8e2d> Task: (MOR:task-2771) (Name:reconfigureEx) Entity: (MOR:domain-c7) (Name:mgmt-01) is complete
2018-10-18 13:56:36.171 [Executor-4] INFO [ c.v.e.s.c.client.vmware.vsphere.VcManagerBase] <########-####-####-####-########8e2d> HA feature on cluster mgmt-01 is true
2018-10-18 13:56:36.174 [Executor-4] WARN [c.v.v.v.c.h.i.HttpConfigurationCompilerBase$ConnectionMonitorThreadBase] <########-####-####-####-########8e2d> Shutting down the connection monitor.
2018-10-18 13:56:36.174 [monitor-80] WARN [c.v.v.v.c.h.i.HttpConfigurationCompilerBase$ConnectionMonitorThreadBase] Interrupted, no more connection pool cleanups will be performed.
2018-10-18 13:56:36.174 [Executor-4] ERROR [c.v.e.sddc.orchestrator.model.error.ErrorFactory] [SJDNSH] VRA_IAAS_MANAGEMENT_AGENT_INSTALLATION_FAILED vRA Iaas Management Agent Installation Failed.
com.vmware.evo.sddc.common.vrealize.vrslcm.error.reporting.exceptions.VrslcmOrchTaskException: vRA Iaas Management Agent Installation Failed.
snip
2018-10-18 14:16:05.888 [ main] DEBUG [ c.v.v.v.s.VrealizeExecutionSubscriberHelper] Found a Vrealize VRA_DEPLOY workflow
2018-10-18 14:16:05.898 [ main] INFO [ c.v.v.v.s.VrealizeExecutionSubscriberHelper] Built the following extended resource operation status: {
"type": "DEPLOY",
"state": "FAILED",
"taskId": "########-####-####-####-########8e2d",
"retriable": true
}
2018-10-18 14:16:05.899 [ main] INFO [ c.v.v.v.s.VrealizeExecutionSubscriberHelper] Getting vRealize Edge ID
2018-10-18 14:16:06.084 [ main] INFO [ c.v.vcf.vault.services.impl.VaultServiceImpl] Getting secret with id ########-####-####-####-########8e2d
2018-10-18 14:16:06.102 [ main] WARN [ c.v.vcf.vrealize.subscriber.VraResourceHelper] Failed to update entity at the end of execution with ID ########-####-####-####-########8e2d.
com.vmware.evo.sddc.inventory.model.error.SddcManagerException: java.lang.NullPointerException
at com.vmware.vcf.vault.services.impl.VaultServiceImpl.getSecret(VaultServiceImpl.java:104)
at com.vmware.vcf.domainmanager.service.config.EncryptedVaultFsmContextStoreImpl.loadExecutionContext(EncryptedVaultFsmContextStoreImpl.java:31)

 
  • You see messages similar to the following in the /var/log/vmware/vcf/operationsmanager.log file on the SDDC Manager VM:
2018-10-18 14:43:14.988 [300-exec-9] DEBUG [ c.v.v.p.service.PasswordLookupService] Unable to get some resource details. reason: {}
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Retriable operation 'Inventory collection task for VRA' failed to complete after 3 retries.
snip
Caused by: java.lang.RuntimeException: Retriable operation 'Inventory collection task for VRA' failed to complete after 3 retries.
at com.vmware.evo.sddc.common.util.RetriableCallable.call(RetriableCallable.java:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 common frames omitted
Caused by: java.net.UnknownHostException: vrealize-vra-vip.corp.local

 
  • You see that the state of the task is still IN_PROGRESS:
curl http://localhost/commonsvcs/vrealize/internal/vra/status | >json_pp
 
"state" : "NOT_DEPLOYED",
"operations" : {
"uninstall" : {
"state" : "NOT_AVAILABLE"
},
"deploy" : {
"taskId" : "########-####-####-####-########8e2d",
"state" : "IN_PROGRESS"
}
}
}


Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware Cloud Foundation 3.0.x

Resolution

This is a known issue affecting VMware Cloud Foundation 3.0. There is currently no resolution.



Workaround:

The following steps can be used to workaround this issue:

Note: Take a snapshot of the SDDC Manager VM prior to starting. The snapshot should be removed once the steps are completed and the issue is resolved.

For a vRealize Automation Deployment:

  1. ssh to the SDDC Manager VM as the vcf user.
  2. Issue the following command to collect information related to the failed workflow:
cqlsh --cqlversion=3.4.4 -e "expand on; select id,status,operationstatus from inventory.vra"
 
Note: You will see output similar to the following:
 
@ Row 1
-----------------+-------------------------------------------------------------------------------------------------------------------------------
id | ########-####-####-####-########7165
status | ACTIVATING
operationstatus | {"resourceStatus":"ACTIVATING","lastOperation":{"type":"DEPLOY","state":"IN_PROGRESS","taskId":"########-####-####-####-########2e97"}}
 
Note: Make a note of the id value (########-####-####-####-########7165 in this example) and the taskId value (########-####-####-####-########2e97 in this example) as they will be used in the next step.
  1. Issue a command similar to the following to update the status and state of the workflow:
cqlsh --cqlversion=3.4.4 -e "update inventory.vra set operationstatus= '{\"resourceStatus\":\"ERROR\",\"lastOperation\":{\"type\":\"DEPLOY\",\"state\":\"FAILED\",\"taskId\":\"########-####-####-####-########2e97\"}}' where id = '########-####-####-####-########7165'"
 
Notes:
  • The resourceStatus value of ACTIVATING is replaced with ERROR and the state value of IN_PROGRESS is replaced with  FAILED.
  • Replace ########-####-####-####-########7165 with the id value returned in Step 2 and ########-####-####-####-########7165 with the taskId value returned in Step 2.
  1. Issue the command from Step 2 again. You should see output similar to the following:
@ Row 1
-----------------+-------------------------------------------------------------------------------------------------------------------------------
id | ########-####-####-####-########7165
status | ACTIVE
operationstatus | {"resourceStatus":"ERROR","lastOperation":{"type":"DEPLOY","state":"FAILED","taskId":"########-####-####-####-########2e97"}}

Note: It is normal for the status value to remain as ACTIVE.
  1. Log out of SDDC Manager UI and then log back in. Navigate to Administration -> vRealize Suite -> vRealize Automation. The option to uninstall will now be visible.
  2. Click the Uninstall button.
  3. Correct any issues that were related to the failed vRealize Automation deployment.
  4. Initiate a new vRealize Automation deployment.
 

For a vRealize Operations Deployment:

The process for correcting this issue in a vRealize Operations deployment should be nearly identical to the process documented for vRealize Automation. The primary difference is the table in the database where the record is held. For a vRealize Automation deployment, the table is inventory.vra while for vRealize Operations the table is inventory.vrops. Any command referencing the inventory.vra table would need to be updated to reference the inventory.vrops table.