CSI pod secret and workload_storage_management user password are not in sync in vSphere with Tanzu
search cancel

CSI pod secret and workload_storage_management user password are not in sync in vSphere with Tanzu

book

Article ID: 381071

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • CSI controller pods are in crashloopbackoff state.
  • Checking kb article CSI: Correct sync between CSI pod secret and workload_storage_management user password in vSphere with Tanzu shows the password on vCenter for the wcp storage user is different from what we see in the password output on the Supervisor SSH.
  • The WCP service log /var/log/vmware/wcp/wcpsvc.log contains entries similar to:
    YYYY-MM-DDThh:mm:ss.xxxxZ debug wcp [kubelifecycle/kube_instance.go:4395] [opID=XXXXXXXX-56c23714-4bb3-4079-878b-YYYYYYYYYYYY] Cluster is not ready yet, would retry in 1m0s time.
    YYYY-MM-DDThh:mm:ss.xxxxZ debug wcp [kubelifecycle/kube_instance.go:4395] [opID=XXXXXXXX-56c23714-4bb3-4079-878b-YYYYYYYYYYYY] Cluster is not ready yet, would retry in 1m0s time.
    YYYY-MM-DDThh:mm:ss.xxxxZ debug wcp [kubelifecycle/kube_instance.go:4395] [opID=XXXXXXXX-56c23714-4bb3-4079-878b-YYYYYYYYYYYY] Cluster is not ready yet, would retry in 1m0s time.
    YYYY-MM-DDThh:mm:ss.xxxxZ debug wcp [kubelifecycle/kube_instance.go:4395] [opID=XXXXXXXX-56c23714-4bb3-4079-878b-YYYYYYYYYYYY] Cluster is not ready yet, would retry in 1m0s time.
    YYYY-MM-DDThh:mm:ss.xxxxZ debug wcp [kubelifecycle/kube_instance.go:4395] [opID=XXXXXXXX-56c23714-4bb3-4079-878b-YYYYYYYYYYYY] Cluster is not ready yet, would retry in 1m0s time.

Environment

vSphere with Tanzu

Cause

Service accounts like wcp-cluster-user, wcp-vmop-user, wcp-storage-user etc. get locked while the reset is happening at the wcpsvc side. This has been observed across VC 7.x , 8.x and 9 as well.

Password rotation will have 2 parts to it:

  1. wcpsvc interacts with the svcacctmgmt apis to reset the password and update the database with the new password.
  2. wcpsvc interacts with the supervisor cluster via kube client to update the secret with the new password.

 

Now, there can be scenarios where the step 1 is completed and step 2 is ignored due cluster being unhealthy or client creation errors. As this is a go routine and not part of the reconcile loop the retry to update the secret will only happen after the scheduled 1 minute retry delays. While the credentials stay invalid at the supervisor cluster side, the consumer of the secret mostly operators keep on fetching sessions using the invalid password. This will cause the service account to get locked.

Resolution

To fix this issue, please follow the steps outlined in the following KB CSI: Correct sync between CSI pod secret and workload_storage_management user password in vSphere with Tanzu