Supervisor deployment stuck in configuring state with error "Timed out waiting for LB service update"
search cancel

Supervisor deployment stuck in configuring state with error "Timed out waiting for LB service update"

book

Article ID: 429425

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime

Issue/Introduction

  • Following is seen on the vCenter UI during the supervisor deployment:

    Supervisor Config status 
    "Configured Load Balancer fronting the kubernetes API Server
    Timed out waiting for LB service update. This operation is part of the cluster enablement and will be retried."

    Host Config Status
    All 3 hosts show similar message
    "Kubernetes Worker Node is schedulable
    A general system error occurred. Error message: waiting for node <host name> to move to ready state."

  • While connected to the Supervisor cluster context, the following symptoms are observed:
    All csi-controller pods are in CrashLoopBackOff state with similar error messages to the following:

    kubectl get pods -n vmware-system-csi 
    kubectl logs -n vmware-system-csi <vsphere-csi-controller pod> -c vsphere-csi-controller
    "msg":"failed to get clusterComputeResourceMoIds. err: could not find any AvailabilityZone"

  • Since the csi-controller pods are failing, the vmop-controller-manager and psp-operator-mgr system pods will also be in CrashLoopBackOff due to their dependency on csi objects.

    vmop-controller-manager logs will show an error message similar to the following:
    problem creating controller manager" err="failed to add resources to the manager: failed to initialize Volume controller: no matches for kind \"CnsNodeVmAttachment\" in version \"cns.vmware.com/v1alpha1\"" logger="entrypoint"

    psp-operator-mgr logs error messages will be similar to the below:
    "msg":"Error starting a watch for StoragePools","err":"the server could not find the requested resource"
    "msg":"Shutting down due to error","error":"the server could not find the requested resource"

  • The load balancer pods (ako/nsx-ncp) in the Supervisor cluster however are in a Running state. 

Environment

vSphere Kubernetes Service

Resolution

Restart wcp service on the vCenter server. 

SSH to the vCenter Server and run the following command: 
service-control --stop wcp && service-control --start wcp