Unable to deploy vCenter server virtual appliance during creation of new Workload Domain
search cancel

Unable to deploy vCenter server virtual appliance during creation of new Workload Domain

book

Article ID: 381532

calendar_today

Updated On: 12-18-2024

Products

VMware SDDC Manager

Issue/Introduction

When deploying a new Workload Domain, the workflow fails at the 'Deploy vCenter Server' subtask.

The vCenter OVF deploys successfully.

The vCenter powers up successfully.

Many required services fail to start.

The SDDC re-tries the deployment 3 times before failing the task and deletes the appliance.

As the appliance is deleted it's not possible to review the logs on the failed vCenter

Environment

VCF 5.X

Cause

To determine the cause, you will need to update the /opt/vmware/vcf/domainmanager/config/application-prod.properties file on the SDDC to prevent the SDDC cleaning up the failed vCenter appliance after the task failure.

Proceed as below:

  • SSH to the SDDC as vcf user and elevate to root user
  • Use the vi editor to amend the /opt/vmware/vcf/domainmanager/config/application-prod.properties file by adding the following two lines:
  •  
  • orchestrator.task.undoOnFailure=false
  • orchestrator.task.retry.max=1
  •  
  • (NOTE: the application-prod.properties file may be symlinked to application.properties file)
  • Restart the domainmanger service: systemctl restart domainmanager

Now restart the deployment and when it fails you should be able to SSH to the failed vCenter and review the logs

The vcsa-cli-installer.log indicate that the sts service fails to start:

2024-11-06 02:55:01,067 - vCSACliInstallLogger - INFO - OVF Tool: Received IP address: xx.xx.xx.xx
2024-11-06 03:02:04,269 - vCSACliInstallLogger - DEBUG - Querying REST endpoint '/rest/vcenter/deployment' on appliance 'xx.xx.xx.xx' for deployment status
2024-11-06 03:02:04,269 - vCSACliInstallLogger - DEBUG - Requesting deployment status from target vCSA REST API endpoint 'https://xx.xx.xx.xx:5480/rest/vcenter/deployment'
2024-11-06 03:02:04,335 - vCSACliInstallLogger - INFO - ==========VCSA Deployment Progress Report==========
        Task: Install required RPMs for the appliance.(SUCCEEDED 100/100)       - Task has completed successfully.
        Task: Run firstboot scripts.(FAILED 27/100)     - Starting VMware Security Token Service...
                Error: Encountered an internal error.

Traceback (most recent call last):
  File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 1170, in main
    vmidentityFB.boot()
  File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 275, in boot
    self.configureSTS(self.__stsRetryCount, self.__stsRetryInterval)
  File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 791, in configureSTS
    self.startSTSService()
  File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 751, in startSTSService
    returnCode = self.startService(self.__sts_service_name)
  File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 80, in startService
    update_services_runstate("start", None, False, False, svc_names=[svc_name])
  File "/usr/lib/vmware/site-packages/cis/svcsController.py", line 1122, in update_services_runstate
    _update_services_runstate_svclist('start', svc_nodenames,
  File "/usr/lib/vmware/site-packages/cis/svcsController.py", line 883, in _update_services_runstate_svclist
    controller.start_svc(svc_id, explicit_op=explicit_op)
  File "/usr/lib/vmware/site-packages/cis/svcsController.py", line 516, in start_svc
    service_start(svc_id, quiet=_quiet,
  File "/usr/lib/vmware/site-packages/cis/utils.py", line 1173, in service_start
    raise ServiceStartException(svc_name)
cis.exceptions.ServiceStartException: {
    "detail": [
        {
            "id": "install.ciscommon.service.failstart",
            "translatable": "An error occurred while starting service '%(0)s'",
            "args": [
                "sts"
            ],
            "localized": "An error occurred while starting service 'sts'"
        }
    ],
    "componentKey": null,
    "problemId": null,
    "resolution": null
}

The vmon log pinpoints the failure:

Service pre-start command's stdout:
Service pre-start command's stderr: Traceback (most recent call last):
File "/usr/lib/vmidentity/install/STS/installer/sts-prestart-script.py", line 551, in <module>
raise e
File "/usr/lib/vmidentity/install/STS/installer/sts-prestart-script.py", line 164, in sts_prestart_setup_service_account
create_sso_group("ActAsUsers", "Act-As Users")
File "/usr/lib/vmidentity/install/STS/installer/sts-prestart-script.py", line 137, in _create_sso_group
if sso_group(vsc.group_exists(group_name)) return True
File "/usr/lib/vmware/site-packages/cis/veecs.py", line 374, in group_exists
raise InvokeCommandAndException(error)
cis.exceptions.InvokeCommandException: {
"detail": [
{
"id": "install.ciscommon.command.errinvoke",
"translatable": "An error occurred while invoking external command : '%(0)s'",
"args": [
"Error 46 while finding SSO group "ActAsUsers".\n dir-cli failed. Error 1326: Operation failed with error ERROR LOGON FAILURE (1326) \n"
],
"localized": "An error occurred while invoking external command : 'Error 46 while finding SSO group "ActAsUsers".\ndir-cli failed. Error 1326: Operation failed with error ERROR LOGON FAILURE (1326) \n'"
},
"componentKey": null,
"problemId": null,
"resolution": null

 

It is clear that a check on the membership of the ActAsUsers sso group using a dir-cli command failed due to a "LOGON FAILURE"

(The vCenter machine account is used to check the memberships)

This issue can occur when the Local User password policy has been amended on the Management vCenter.

When the workflow deploys the new vCenter, sso configurations, including the Local User password policy, are copied from the Management vCenter.

If you have set the Minimum Length to a value GREATER than 20, vCenter completely ignores the Minimum Length value and will always apply the Maximum Length value. If the Maximum Length value is greater that 32 (50 for example) all internal local user passwords will have a character length of 50.

A password of this length is too long when vCenter uses the dir-cli utility during firstboot to check the ActAsUsers group memberships.

Example of a non-default password policy:

 

 

Resolution

  • Delete the failed vCenter or power it down and rename it
  • Amend the Local User password policy on the Management vCenter
  • Set the Minimum length to a value between 8 and 20
  • If there are other vCenters in the VCF environment ensure that change has been replicated to them.
  • Restart the workflow from the SDDC UI and it should succeed.

Additional Information

vCenter SSO Password Policy
https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.authentication.doc/GUID-B9C4409A-B053-40C3-96DE-232BB99AAA35.html


The password policy picks up the maximum length value only if the minimum length is greater than 20 characters. The behavior of the password policy is undefined or could result in failure of services when the minimum length value is greater than 20 characters and the maximum length is set to any value. To avoid a potential problem, leave the minimum length set to the default value of 8 characters, or no greater than 20 characters.