Workload Cluster Upgrade Stuck on New Control Plane Machine Provisioning due to Locked ImageRegistryOperator User
search cancel

Workload Cluster Upgrade Stuck on New Control Plane Machine Provisioning due to Locked ImageRegistryOperator User

book

Article ID: 413346

calendar_today

Updated On:

Products

VMware vCenter Server 8.0 Tanzu Kubernetes Runtime VMware vSphere Kubernetes Service

Issue/Introduction

In vCenter, the following symptoms are observed:

  • A syslog server can be occasionally flooded with the error message:
    "User account locked: {Name: wcp-vmimageserviceop-user-<###-####-####-####>, Domain: <sso-domain.name>}"


  • In the VCSA, the /var/log/vmware/vmdird/vmdird.log is filled with the below error messages. (SSO Domain name is vsphere.local in the following example):
    ERROR: Bind Request Failed (127.0.0.1) error 49: Protocol version: 3, Bind DN: "CN=wcp-vmimageserviceop-user-<###-####-####-####>,cn=ServicePrincipals,dc=vsphere,dc=local", Method: SASL
    ERROR: VdirPasswordFailEvent from user(cn=wcp-vmimageserviceop-user-<###-####-####-####>,cn=serviceprincipals,dc=vsphere,dc=local), error(0)()
    WARNING: Lockout policy check - account lockout. (cn=wcp-vmimageserviceop-user-<###-####-####-####>,cn=serviceprincipals,dc=vsphere,dc=local)

     

  • The dedicated content library for KR images shows that the desired KR image is Security Compliant.


  • A manual sync of the desired KR image in the content library does not resolve the issue.

While connected to the Supervisor cluster context, one or more of the following symptoms are observed:

  • The first new control plane machine in a workload cluster upgrade is stuck Provisioning:
    kubectl get machines -n <workload cluster namespace>

     

  • Describing the machine shows that it is waiting for a ProviderID and that its virtual machine image is not security compliant:
    kubectl describe machine <stuck Provisioning machine> -n <workload cluster namespace>
    
    Waiting on ProviderID
    VirtualMachineImageProviderSecurityNotCompliant

     

  • Describing the corresponding clustercontentlibraryitem reports that the image is not security compliant:
    kubectl get clustercontentlibraryitem | grep <desired KR version>
    
    kubectl describe clustercontentlibraryitem <clustercontentlibraryitem ID>
    
    Security Compliance: false

     

  • VMOP controller pod logs show that the corresponding VirtualMachineImage is not ready for the new control plane VM:
    kubectl logs -n vmware-system-vmop <vmop controller pod name>
    
    "Reconcile error" err='VirtualMachineImage is not ready"

     

  • Checking the imageregistryoperator pod logs show that there is an error connecting to vCenter:
    kubectl logs -n vmware-system-imageregistry <imageregistryoperator pod name>
    
    Cannot complete login due to an incorrect user name or password

Environment

vCenter 8.x

vSphere Supervisor

Cause

Any service account can become locked after multiple consecutive invalid logins.

When service account password is rotated in WCP, invalid logins may occur if the operator attempts to login to vCenter with credentials before the credentials are refreshed in the operator cache.

Because image-registry operator does not implement a delay in between failed logins, multiple invalid logins are attempted consecutively and the service account gets locked out.

Resolution

This is a known issue where a fix is in progress.

 

Workaround:

If this issue is causing any functional impact, the following workaround can be applied 

  1. SSH into the vCenter Appliance (VCSA)

  2. Retrieve the full service account name for the ImageRegistryOperator service account:
    cat /var/log/vmware/vmdird/vmdird.log | grep vmimageservice
    
    cn=wcp-vmimageserviceop-user-<###-####-####-####>

     

  3. Check for wcp-vmimageserviceop account details: 

    /usr/lib/vmware-vmafd/bin/dir-cli user find-by-name --account wcp-vmimageserviceop-user-<###-####-####-####> --level 2
    The above command will ask for the password for [email protected]

    The below output indicates that this service account is locked.

    Account: wcp-vmimageserviceop-user-<###-####-####-####>
    UPN: wcp-vmimageserviceop-user-<###-####-####-####>@vsphere.local
    Account disabled: FALSE
    Account locked: TRUE
    Password never expires: FALSE
    Password expired: FALSE

     

  4. Unlock the account:

    /opt/likewise/bin/ldapmodify -x -D cn=Administrator,cn=Users,dc=vsphere,dc=local -W <<EOF
    dn: cn=wcp-vmimageserviceop-user-<###-####-####-####>,cn=serviceprincipals,dc=vsphere,dc=local
    changetype: modify
    replace: userAccountControl
    userAccountControl: 0
    EOF


  5. Confirm that the account is unlocked:
    /usr/lib/vmware-vmafd/bin/dir-cli user find-by-name --account wcp-vmimageserviceop-user-<###-####-####-####> --level 2


  6. Connect to the Supervisor cluster context or SSH into a Supervisor cluster control plane VM directly


  7. Scale down then scale up imageregistry-operator pods:
    kubectl scale deploy -n vmware-system-imageregistry vmware-system-imageregistry-controller-manager --replicas=0
    
    kubectl scale deploy -n vmware-system-imageregistry vmware-system-imageregistry-controller-manager --replicas=2

     

  8. If the image still shows as security non-compliant, a manual sync of the image in the content library may need to be performed from the vCenter web UI.