WCP and vStats service fails to start with "Error 46 while finding SSO group"
search cancel

WCP and vStats service fails to start with "Error 46 while finding SSO group"

book

Article ID: 379776

calendar_today

Updated On:

Products

VMware vCenter Server 7.0 VMware vCenter Server 8.0

Issue/Introduction

  • WCP (Workload Control Plane) service fails to start
  • vStats service fails to start (vSphere Automation API)
  • WCP service is in stopped state, administrator cannot set any ESXi host in maintenance mode using vCenter Server
  • Certificate Renewal using the built in certificate manager may fail, triggering a rollback of the certificates.
  • Attempting to start WCP service fails with the following from an SSH session to vCenter:

    root@<FQDN of VC>#service-control --start wcp

    Operation not cancellable. Please wait for it to finish...
    Performing start operation on service wcp...

    stderr: Error executing start on service wcp. Details {
    "detail": [
    {
    "id": "install.ciscommon.service.failstart",
    "translatable": "An error occurred while starting service '%(0)s'",
    "args": [
    "wcp"
    ],
    "localized": "An error occurred while starting service 'wcp'"
    }
    ],
    "componentKey": null,
    "problemId": null,
    "resolution": null
    }

  • vCenter - /var/log/vmware/vmon/vmon.log

    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03) host-2515 <wcp> Service pre-start command's stderr: Failed to configure HDCS. Err {hh:mm:ss.X
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515     "detail": [
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515         {
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515             "id": "install.ciscommon.command.errinvoke",
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515             "translatable": "An error occurred while invoking external command : '%(0)s'",
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515             "args": [
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515                 "Error 46 while finding SSO group \"vCLSAdmin\":\ndir-cli failed. Error 1326: Operation failed with error ERROR_LOGON_FAILURE (1326) \n"
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515             ],
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515             "localized": "An error occurred while invoking external command : 'Error 46 while finding SSO group \"vCLSAdmin\":\ndir-cli failed. Error 1326: Operation failed with error ERROR_LOGON_FAILURE (1326) \n'"
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515         }
    YYYY-MM-DDTXXZ Wa(03)+ host-2515     ],
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515     "componentKey": null,
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515     "problemId": null,
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515     "resolution": null
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-2515 }
    YYYY-MM-DDThh:mm:ss.XXXZ Er(02) host-2515 <wcp> Service pre-start command failed with exit code 1.

    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03) host-XXXXX <vstats> Service pre-start command's stderr: Traceback (most recent call last):
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX   File "/usr/lib/vmware-vstats/scripts/vstats_pre_start.py", line 175, in <module>
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX     patch_sso()
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX   File "/usr/lib/vmware-vstats/scripts/vstats_pre_start.py", line 160, in patch_sso
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX     sso_patch.ensure_groups_exist(VSTATS_GROUP, VSTATS_GROUP_DESCRIPTION)
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX   File "/usr/lib/vmware-vstats/scripts/vstats_pre_start.py", line 42, in ensure_groups_exist
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX     if self.sso_group.group_exists(group) == True:
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX   File "/usr/lib/vmware/site-packages/cis/vecs.py", line 374, in group_exists
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX     raise InvokeCommandException(error)
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX cis.exceptions.InvokeCommandException: {
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX     "detail": [
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX         {
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX             "id": "install.ciscommon.command.errinvoke",
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX             "translatable": "An error occurred while invoking external command : '%(0)s'",
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX             "args": [
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX                 "Error 46 while finding SSO group \"vStatsGroup\":\ndir-cli failed. Error 1326: Operation failed with error ERROR_LOGON_FAILURE (1326) \n"
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX             ],
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX             "localized": "An error occurred while invoking external command : 'Error 46 while finding SSO group \"vStatsGroup\":\ndir-cli failed. Error 1326: Operation failed with error ERROR_LOGON_FAILURE (1326) \n'"
    YYYY-MM-DDThh:mm:ss.XXXZ Wa(03)+ host-XXXXX         }

  • vCenter - /var/log/vmware/vmdird/vmdird-syslog.log

    YYYY-MM-DDThh:mm:ss.XXXZ err vmdird  t@140054890526464: SASLSessionStep: sasl error (-13)(SASL(-13): authentication failure: client evidence does not match what we calculated. Probably a password error)
    YYYY-MM-DDThh:mm:ss.XXXZ err vmdird  t@140054890526464: VmDirSendLdapResult: Request (Bind), Error (LDAP_INVALID_CREDENTIALS(49)), Message ((49)(SASL step failed.)), (0) socket (127.0.0.1)
    YYYY-MM-DDThh:mm:ss.XXXZ err vmdird  t@140054890526464: Bind Request Failed (127.0.0.1) error 49: Protocol version: 3, Bind DN: "cn=<FQDN of VCENTER>,ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL

    YYYY-MM-DDThh:mm:ss.XXXZ info vmdird  t@140055721010944: VmDirGetAccountUPN success for AccountUPN (workload_storage_management-27789762-bca9-434f-810a-8c83b91b914b@VSPHERE.local)
    YYYY-MM-DDThh:mm:ss.XXXZ info vmdird  t@140055721010944: Srv_RpcVmDirGetAccountUPN success AccountUPN Length (79)
    YYYY-MM-DDThh:mm:ss.XXXZ info vmdird  t@140055419004672: Modify Entry (CN=workload_storage_management-27789762-bca9-434f-810a-8c83b91b914b,cn=ServicePrincipals,dc=vsphere,dc=local, EID 3237)(from )(by )(via Int)(USN 12319,0)
    YYYY-MM-DDThh:mm:ss.XXXZ info vmdird  t@140055419004672: Modify Entry (CN=workload_storage_management-27789762-bca9-434f-810a-8c83b91b914b,cn=ServicePrincipals,dc=vsphere,dc=local, EID 3237)(from )(by )(via Int)(USN 12320,0)
    YYYY-MM-DDThh:mm:ss.XXXZ info vmdird  t@140055419004672: User account control - (cn=workload_storage_management-27789762-bca9-434f-810a-8c83b91b914b,cn=serviceprincipals,dc=vsphere,dc=local): (800010) flag unset, new value=(0)
    YYYY-MM-DDThh:mm:ss.XXXZ info vmdird  t@140055419004672: Password Modification Successful (). Bind DN: "". Modified DN: "CN=workload_storage_management-27789762-bca9-434f-810a-8c83b91b914b,cn=ServicePrincipals,dc=vsphere,dc=local"
    YYYY-MM-DDThh:mm:ss.XXXZ info vmdird  t@140055419004672: VmDirSrvForceResetPassword (workload_storage_management-27789762-bca9-434f-810a-8c83b91b914b@VSPHERE.local)

Cause

The issue is only seen if the machine account password is beyond 20 characters which can be tested by setting the "vmwPasswordMinLength" to above 20.

Resolution

IMPORTANT! Take offline (powered off) snapshots of all PSC's and VC's in the same vSphere Domain (or in ELM) before attempting.  This is standard best practice before making any manual changes to the PSC VMDIRD database.

To resolve the issue, follow below mentioned steps

  1. Connect to the affected vCenter from an SSH session using  the root user.
    Note: Type "shell" to gain access in shell mode and enable bash as default to avoid issues with the following command:

    chsh -s /bin/bash root



  2. Verify number of characters in "dcAccountPassword"  with the following command:

    /opt/likewise/bin/lwregshell list_values '[HKEY_THIS_MACHINE\services\vmdir]' | egrep -i "Password|dcAccountDN"

    Example Output:

    + "dcAccountDN" REG_SZ "cn=<FQDN OF vCenter>=Domain Controllers,dc=example,dc=local"
    + "dcAccountOldPassword" REG_SZ "`<XXXXXXXXXXXXXXXXXXX>"
    + "dcAccountPassword" REG_SZ <XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX>

  3. If the "dcAccountPassword" is more that 20 characters then the machine account password needs to be changed. To reset the machine account password, use please refer to LDAP Error Code 49 : Reset Machine Account Password of vCenter Server Appliance using Shell Script

  4. Once the machine account is reset, restart all the service of the vCenter server with the following command:

    service-control --stop --all && service-control --start --all

  5. Running the following command to ensure DOS carriage returns are removed from the reset_machine_ps.sh script:

    sed -i -e 's/$//' reset_machine_pw.sh

  6. Run the reset_machine_ps.sh script:

    ./reset_machine_pw.sh

Workaround

Note: The password policy picks up the maximum length value only if the minimum length is greater than 20 characters. The behavior of the password policy is undefined or could result in failure of services when the minimum length value is greater than 20 characters and the maximum length is set to any value. To avoid a potential problem, leave the minimum length set to the default value of 8 characters, or no greater than 20 characters.

There may be requirements the minimum password to be 20 characters or larger for certain policies. To work around this issue, set the minimum limit to any value to ≥20 with the maximum limit to X where X is equal to the desired password length. This will allow the password of length X characters after executing the reset_machine_pw.sh script

Example: 
Set minimum = 20
Set maximum = 25

When executing ./reset_machine_pw.sh, input the password length of 25 characters and restart all services on the vCenter. The WCP and vStats services will now start.

Additional Information