Namespace Stuck in Deleting Status or Guest Cluster Deployment stuck in Pending on vSphere Supervisor
search cancel

Namespace Stuck in Deleting Status or Guest Cluster Deployment stuck in Pending on vSphere Supervisor

book

Article ID: 426460

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

vSphere Supervisor operations hang or fail to complete
- Namespace deletion fails to complete
- VKS cluster deployments are stuck in provisioning. 
- When reviewing the vCenter logs, the failing deployment may cause errors in /var/log/vmware/wcpsvc.log

Masked Output:
YYYY-MM-DDThh:mm:ss.xxxxZ debug wcp [kubelifecycle/kube_instance.go:4395] [opID=########-########-####-####-####-############] Cluster is not ready yet, would retry in 1m0s time.
YYYY-MM-DDThh:mm:ss.xxxxZ debug wcp [kubelifecycle/kube_instance.go:4395] [opID=########-########-####-####-####-############] Cluster is not ready yet, would retry in 1m0s time.
YYYY-MM-DDThh:mm:ss.xxxxZ debug wcp [kubelifecycle/kube_instance.go:4395] [opID=########-########-####-####-####-############] Cluster is not ready yet, would retry in 1m0s time.
YYYY-MM-DDThh:mm:ss.xxxxZ debug wcp [kubelifecycle/kube_instance.go:4395] [opID=########-########-####-####-####-############] Cluster is not ready yet, would retry in 1m0s time.
YYYY-MM-DDThh:mm:ss.xxxxZ debug wcp [kubelifecycle/kube_instance.go:4395] [opID=########-########-####-####-####-############] Cluster is not ready yet, would retry in 1m0s time.

-/var/log/vmware/vmdird/vmdird contains errors similar to the following:

YYYY-MM-DDTHH:MM:SS:t@################:WARNING: Lockout policy check - account lockout. (cn=wcp-storage-user-########-####-####-####-############-########-####-####-####-############,cn=serviceprincipals,dc=tanzu,dc=local)
YYYY-MM-DDTHH:MM:SS:t@################:ERROR: VdirPasswordFailEvent from user(cn=wcp-storage-user-########-####-####-####-############-########-####-####-####-############,cn=serviceprincipals,dc=tanzu,dc=local), error(0)()
YYYY-MM-DDTHH:MM:SS:t@################:ERROR: VmDirSendLdapResult: Request (Bind), Error (LDAP_INVALID_CREDENTIALS(49)), Message ((49)(SASL step failed.)), (0) socket (127.0.0.1)
YYYY-MM-DDTHH:MM:SS:t@################:ERROR: Bind Request Failed (127.0.0.1) error 49: Protocol version: 3, Bind DN: "CN=wcp-storage-user-########-####-####-####-############-########-####-####-####-############,cn=ServicePrincipals,dc=tanzu,dc=local", Method: SASL
YYYY-MM-DDTHH:MM:SS:t@################:ERROR: SASLSessionStep: sasl error (-13)(SASL(-13): authentication failure: client evidence does not match what we calculated. Probably a password error)

Environment

vSphere with Tanzu Supervisor

Cause

The wcp-storage-user password is used by the wcp service and the cns-driver of the Supervisor cluster to perform volume management operations. 
This issue occurs when the password is expired, out of sync between the supervisor and vCenter, or other issues prevent the user from successfully authenticating. 

Resolution

A resync operation can be triggered by restarting the wcp service using the following command:service-control --restart wcp

If the errors persist after executing the restart, please refer to the following KB for additional steps in unlocking and/or resyncing the passwords:
CSI: Correct sync between CSI pod secret and workload_storage_management user password in vSphere with Tanzu 

Should any issues or complications occur during remediation, please open a Broadcom support case.