The purpose of this KB is workaround the issues described above and add a VxRail host successfully to the cluster in SDDC Manager.
Symptoms:
When adding a VxRail host to expand a WLD cluster in SDDC Manager:
domainmanager=# select * from task where id ='a4ff8635-####-####-####-##########6c';
id | a4ff8635-####-####-####-##########6c
resource_id | 93303e5d-####-####-####-##########62
resource_type | ESX_HOST
state | COMPLETED_WITH_FAILURE
description | Adding new host(s) to vxrail cluster
errors | [{"messageBundle":"com.vmware.evo.sddc.common.core.error.messages","errorCode":"VCF_ERROR_INTERNAL_SERVER_ERROR","arguments":[],
"message":"A problem has occurred on the server. Please retry or contact the service provider and provide the reference token.","cause":
[{"type":"com.vmware.evo.sddc.common.services.error.SddcManagerServicesIsException","message":"Error in getting workflow options for addition of host to cluster. Check logs"},
{"type":"com.vmware.evo.sddc.common.vxrail.error.VxRailManagerException","message":"Unable to fetch details for port groups managed by VxRail Manager vxrm.gsslabs.com"}],"referenceToken":"ONPAQ3"}]
timestamp | 1674144689474
completion_timestamp |
localizable_description | null
2023-01-19T20:52:50.118+0000 DEBUG [vcf_dm,63bb90392797462f,03ea] [c.v.v.secure.http.HttpClientService,dm-exec-5] Making request: GET https://vxrm.gsslabs.com:443/rest/vxm/v1/system/cluster-portgroups/esx07.gsslabs.com ... ... 2023-01-19T20:52:51.695+0000 ERROR [vcf_dm,2f2578c538a84ba1,559a] [c.v.v.v.h.w.VxRailHostWorkflowInitiator,dm-exec-6] Failed to start workflow for add host task a4ff8635-####-####-####-##########6c com.vmware.evo.sddc.common.services.error.SddcManagerServicesIsException: Error in getting workflow options for addition of host to cluster. Check logs at com.vmware.evo.sddc.common.services.adapters.workflow.options.WorkflowOptionsAdapterImpl.getWorkflowOptionsForAddHostToVxRailCluster(WorkflowOptionsAdapterImpl.java:269) at com.vmware.vxrail.vcf.hostmanager.workflows.VxRailHostWorkflowInitiator.startWorkFlow(VxRailHostWorkflowInitiator.java:151) at com.vmware.vxrail.vcf.hostmanager.workflows.VxRailHostWorkflowInitiator$$FastClassBySpringCGLIB$$13eaaa4f.invoke(<generated>) ... ... Caused by: com.vmware.evo.sddc.common.vxrail.error.VxRailManagerException: Unable to fetch details for port groups managed by VxRail Manager vxrm.gsslabs.com at com.vmware.evo.sddc.common.vxrail.VxRailManagerService.getVxRailSystemTrafficPortGroups(VxRailManagerService.java:1213) at com.vmware.evo.sddc.common.vxrail.VxRailManagerService.getVxRailSystemTrafficPortGroups(VxRailManagerService.java:1277) ... ... Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method)
This issue is caused when the API response from the VxRail Manager for getting the cluster-portgroups takes more than 1.5 minutes.
The API in question (as reported in the logs above) is:
curl -k -X GET --user '[email protected]:<sso_password>' https://<VxRail_Manager>/rest/vxm/v1/system/cluster-portgroups/<VxRail_Host_Name>
For Example:
curl -k -X GET --user '[email protected]:$ecretPa55' https://vxrm.gsslabs.com:443/rest/vxm/v1/system/cluster-portgroups/esx07.gsslabs.com
The timeout value configured in the domainmanager service is 1.5 minutes. So if the API takes longer than that to respond, the task fails with the errors reported above.
To resolve the issue, we need to address why the VxRail Manager is taking an extended amount of time to respond to the GET API call to return the cluster-portgroups.
On the SDDC Manager, we can workaround this temporarily by increasing the timeout value for the domainmanager service. The steps for this are provided below.
0. Take a snapshot of the SDDC VM.
1. SSH to the SDDC Manager with the vcf user, and su root.
2. Edit the file: /etc/vmware/vcf/domainmanager/application-prod.properties
vi /etc/vmware/vcf/domainmanager/application-prod.properties
3. Add the following entry to edit the timeout value to 300,000 ms (i.e 5 minutes)
Note: The default value is 90000 ms (i.e 1.5 minutes)
http.client.timeout.milis=300000
4. Save the file and quit
ESC and :wq!
5. Restart domainmanager service using the command
systemctl restart domainmanager
6. Wait for the service to come up
7. Re-try adding the VxRail host to the cluster.
Reference Document: Add the VxRail Hosts to the Cluster in VMware Cloud Foundation
This time the task should progress forward, and we should see the status of task with its sub-tasks and additional details in the SDDC Manager UI.