Workload cluster provisioning on vSphere remains stuck in "creating" state on Tanzu Mission Control.
search cancel

Workload cluster provisioning on vSphere remains stuck in "creating" state on Tanzu Mission Control.

book

Article ID: 326379

calendar_today

Updated On:

Products

VMware vSphere with Tanzu

Issue/Introduction

Symptoms:

While provisioning a new cluster on vSphere from the registered management cluster in TMC, the workload cluster remains stuck in the "creating" state indefinitely.

You will notice below error in lcm-tkg-extension-xxx pod under vmware-system-tmc namespace
2022-07-26T22:08:02.381Z ERROR controller-runtime.controller Reconciler error {"controller": "cluster", "name": "mgmt-05", "namespace": "shared-services", "error": "failed to validate if Pinniped installed on management cluster with error [unable to list clusters to find management cluster: context deadline exceeded]"}
Also below errors in sync-agent-xxx pod under vmware-system-tmc namespace
{"component":"sync-agent","error":"rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout","level":"error","msg":"error from response","sub-component":"event-stream","time":"2022-07-22T20:21:20Z"}
{"component":"sync-agent","error":"could not connect to TMC endpoint: context deadline 
NOTE:- Management cluster status remains "healthy" on TMC.

Cause

Above mentioned error implies that agents are unable to connect to management cluster. This issue may occur when you modify the LDAP endpoint of already registered TKG cluster on TMC and it results in pinniped validation failed. However, when you use "tanzu" command-line to check workload cluster status, it will show cluster as "running". 

Resolution

To fix the issue, restart sync-agent-xxx and  lcm-tkg-extension-xxx pods running under vmware-system-tmc namespace on management cluster.


Additional Information

Impact/Risks:

Workload cluster creation status remains in "creating" status on TMC.