vSphere Supervisor restore operation is stuck at 25%
search cancel

vSphere Supervisor restore operation is stuck at 25%

book

Article ID: 429913

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime VMware vCenter Server VMware NSX

Issue/Introduction

  • When restoring vSphere Supervisor from a backup, the restore operation may get stuck for 12 hours.
  • The issue only occurs for vSphere Supervisor with NSX networking.
  •  On vCenter server, you might see the following errors in /var/log/vmware/wcp/wcpsvc.log:
    • ERROR wcp 421109 [vc@4413] [kubelifecycle/controller.go:####] [opID=########-########-####-####-####-############-####] Error configuring cluster NIC on master VM vm-#####: failed to create CPVM NIC: Server closed the connection while watching NSX resources. 
      ERROR wcp 421109 [vc@4413] [kubelifecycle/controller.go:####] [opID=########-########-####-####-####-############-####] Error configuring API server on cluster ########-####-####-####-############ Error configuring cluster NIC on master VM. This operation is part of API server configuration and will be retried. ####- WARNING wcp 421109 [vc@4413] [kubelifecycle/controller.go:####] [opID=########-########-####-####-####-############-####] Error configuring cluster NIC. Err  DEBUG wcp 421109 [vc@4413] [kubelifecycle/controller.go:####] [opID=########-########-####-####-####-############-####] Supervisor configuration retry. - You might see errors like the following from nsx-ncp pod logs on the Supervisor Cluster indicating that the wcp-cluster-user-- service account is locked: ####-##-##T##:##:##.#########Z stderr F [ncp MainThread W] nsx_ujo.ncp.vc.session Failed to get JWT token: Failed SAML HoK request: Failed to get or renew SAML HoK from STS: SoapException: ####-##-##T##:##:##.#########Z stderr F faultcode: ns0:FailedAuthentication ####-##-##T##:##:##.#########Z stderr F faultstring: The account of the user trying to authenticate is locked. :: The account of the user trying to authenticate is locked. :: User account locked: {Name: wcp-cluster-user-########-####-####-####-############-########-####-####-####-############, Domain: vsphere.local}

 

Environment

vSphere Supervisor 9.x

NSX-T

Cause

This issue occurs because the wcp-cluster-user-- service account gets locked during the restore process. This can happen when the automated password rotation happens around the same time when the CPVM configurations are restored. If the restored configurations contain stale credentials, repeated login attempts by NCP pod triggers a security lockout of the service account.

Resolution

Currently, the resolution is to wait for the wcp-cluster-user account password sync to trigger on its automated timestamp, which happens every 12 hours. If this is blocking time critical operations, we can apply the below workaround to speed up the process.

Workaround:

  1. Get the affected service account name from the NCP log:
    stderr F faultstring: The account of the user trying to authenticate is locked. :: The account of the user trying to authenticate is locked. :: User account locked: {Name: wcp-cluster-user-########-####-####-####-############-########-####-####-####-############, Domain: vsphere.local} 
  2. On vCenter, check the service account status using dir-cli:

    /usr/lib/vmware-vmafd/bin/dir-cli user find-by-name --account wcp-cluster-user-domain-c#-#####-####-####-####-######### 

    The output will look something like:

    Account: wcp-cluster-user-########-####-####-####-############-########-####-####-####-############ 
    UPN: wcp-cluster-user-########-####-####-####-############-########-####-####-####-############@VSPHERE.LOCAL 
    Account disabled: FALSE 
    Account locked: TRUE 
    Password never expires: FALSE 
    Password expired: FALSE 
    Password expiry: 89 day(s) 19 hour(s) 44 minute(s) 6 second(s) 
  3. If account is showing locked, use the following command to unlock the account (please note, this command executes everything between <<EOF and the final line EOF)

    /opt/likewise/bin/ldapmodify -x -D cn=Administrator,cn=Users,dc=vsphere,dc=local -W <<EOF
    dn: CN=wcp-cluster-user-########-####-####-####-############-########-####-####-####-############,CN=ServicePrincipals,dc=vsphere,dc=local
    changetype: modify
    replace: userAccountControl
    userAccountControl: 0
    EOF

 

If issue still persists after completing above, please raise a case with Broadcom Support.