Unable to deploy vCenter server virtual appliance during creation of new Workload Domain
search cancel

Unable to deploy vCenter server virtual appliance during creation of new Workload Domain

book

Article ID: 381532

calendar_today

Updated On:

Products

VMware SDDC Manager

Issue/Introduction

  • When deploying a new Workload Domain, the workflow fails at the 'Deploy vCenter Server' subtask, the vCenter OVF deploys successfully. powers up successfully however many required services fail to start
  • On the vcsa-cli-installer.log on SDDC Manager, entries similar to below are observed  :

/var/log/vmware/vcf/domainmanager/ci-installer-#####-#####/workflow_######/vcsa-cli-installer.log

YYYY-MM-DD hh:mm:ss,067 - vCSACliInstallLogger - INFO - OVF Tool: Received IP address: xx.xx.xx.xx
YYYY-MM-DD hh:mm:ss,269 - vCSACliInstallLogger - DEBUG - Querying REST endpoint '/rest/vcenter/deployment' on appliance 'xx.xx.xx.xx' for deployment status
YYYY-MM-DD hh:mm:ss,269 - vCSACliInstallLogger - DEBUG - Requesting deployment status from target vCSA REST API endpoint 'https://xx.xx.xx.xx:5480/rest/vcenter/deployment'
YYYY-MM-DD hh:mm:ss,335 - vCSACliInstallLogger - INFO - ==========VCSA Deployment Progress Report==========
        Task: Install required RPMs for the appliance.(SUCCEEDED 100/100)       - Task has completed successfully.
        Task: Run firstboot scripts.(FAILED 27/100)     - Starting VMware Security Token Service...
                Error: Encountered an internal error.

Traceback (most recent call last):
  File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 1170, in main
    vmidentityFB.boot()
  File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 275, in boot
    self.configureSTS(self.__stsRetryCount, self.__stsRetryInterval)
  File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 791, in configureSTS
    self.startSTSService()
  File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 751, in startSTSService
    returnCode = self.startService(self.__sts_service_name)
  File "/usr/lib/vmidentity/firstboot/vmidentity-firstboot.py", line 80, in startService
    update_services_runstate("start", None, False, False, svc_names=[svc_name])
  File "/usr/lib/vmware/site-packages/cis/svcsController.py", line 1122, in update_services_runstate
    _update_services_runstate_svclist('start', svc_nodenames,
  File "/usr/lib/vmware/site-packages/cis/svcsController.py", line 883, in _update_services_runstate_svclist
    controller.start_svc(svc_id, explicit_op=explicit_op)
  File "/usr/lib/vmware/site-packages/cis/svcsController.py", line 516, in start_svc
    service_start(svc_id, quiet=_quiet,
  File "/usr/lib/vmware/site-packages/cis/utils.py", line 1173, in service_start
    raise ServiceStartException(svc_name)
cis.exceptions.ServiceStartException: {
    "detail": [
        {
            "id": "install.ciscommon.service.failstart",
            "translatable": "An error occurred while starting service '%(0)s'",
            "args": [
                "sts"
            ],
            "localized": "An error occurred while starting service 'sts'"
        }
    ],
    "componentKey": null,
    "problemId": null,
    "resolution": null
}

Resolution: This is an unrecoverable error, please retry install. If you encounter this error again, please search for these symptoms in the VMware Knowledge Base for any known issues and possible resolutions. If none can be found, collect a support bundle and open a support request.

YYYY-MM-DD hh:mm:ss,477 - vCSACliInstallLogger - INFO - The VCSA deployment has failed
VCSA Deployment Start Time: YYYY-MM-DDThh:mm:ss.msZ
VCSA Deployment End Time: YYYY-MM-DDThh:mm:ss.msZ

YYYY-MM-DD hh:mm:ss,477 - vCSACliInstallLogger - DEBUG - Ready to collect support bundle from deployed appliance, if applicable
YYYY-MM-DD hh:mm:ss,478 - vCSACliInstallLogger - DEBUG - Proceed with certificate check...
YYYY-MM-DD hh:mm:ss,570 - vCSACliInstallLogger - DEBUG - Successfully collected support bundle and stored at: /var/log/vmware/vcf/domainmanager/ci-installer-####-####/workflow_####/ci-conf-####/vc-support-bundle.tgz

Upon taking reviewing the failed vCenter appliance logs and reviewing the vmon.log, entries similar to below are observed:

/var/log/vmware/vmon/vmon.log

Service pre-start command's stdout:
Service pre-start command's stderr: Traceback (most recent call last):
File "/usr/lib/vmidentity/install/STS/installer/sts-prestart-script.py", line 551, in <module>
raise e
File "/usr/lib/vmidentity/install/STS/installer/sts-prestart-script.py", line 164, in sts_prestart_setup_service_account
create_sso_group("ActAsUsers", "Act-As Users")
File "/usr/lib/vmidentity/install/STS/installer/sts-prestart-script.py", line 137, in _create_sso_group
if sso_group(vsc.group_exists(group_name)) return True
File "/usr/lib/vmware/site-packages/cis/veecs.py", line 374, in group_exists
raise InvokeCommandAndException(error)
cis.exceptions.InvokeCommandException: {
"detail": [
{
"id": "install.ciscommon.command.errinvoke",
"translatable": "An error occurred while invoking external command : '%(0)s'",
"args": [
"Error 46 while finding SSO group "ActAsUsers".\n dir-cli failed. Error 1326: Operation failed with error ERROR LOGON FAILURE (1326) \n"
],
"localized": "An error occurred while invoking external command : 'Error 46 while finding SSO group "ActAsUsers".\ndir-cli failed. Error 1326: Operation failed with error ERROR LOGON FAILURE (1326) \n'"
},
"componentKey": null,
"problemId": null,
"resolution": null

Environment

VCF 5.X

Cause

 

As observed from the vmon.log, It is clear that a check on the membership of the ActAsUsers sso group using a dir-cli command failed due to a "LOGON FAILURE"

(The vCenter machine account is used to check the memberships)

This issue can occur when the Local User password policy has been amended on the Management vCenter.

When the workflow deploys the new vCenter, sso configurations, including the Local User password policy, are copied from the Management vCenter.

If you have set the Minimum Length to a value GREATER than 20, vCenter completely ignores the Minimum Length value and will always apply the Maximum Length value. If the Maximum Length value is greater that 32 (50 for example) all internal local user passwords will have a character length of 50.

A password of this length is too long when vCenter uses the dir-cli utility during firstboot to check the ActAsUsers group memberships.

Example of a non-default password policy:

Please refer vCenter SSO Password Policy for more details

 

 

Resolution

To resolve the issue, please follow the below steps:

  • Delete the failed vCenter or power it down and rename it
  • Amend the Local User password policy on the Management vCenter
  • Set the Minimum length to a value between 8 and 20
  • If there are other vCenter servers in the VCF environment ensure that change has been replicated to them.
  • Restart the workflow from the SDDC UI to deploy the vCenter and it should succeed.

In case the error seen on the vmon.log on the failed vCenter differs from the one shared on this article, please create a case with Broadcom Support referencing this article.