vSphere Kubernetes Service Supervisor Upgrade Stuck at Bootstrap Phase Due to svchost Deadlock
search cancel

vSphere Kubernetes Service Supervisor Upgrade Stuck at Bootstrap Phase Due to svchost Deadlock

book

Article ID: 438026

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • During a vSphere Kubernetes Service (VKS) Supervisor upgrade, the process becomes stuck when deploying a new Workload Control Plane (WCP) node. The new node is deployed but remains hung at the bootstrap phase, unable to proceed with configure-wcp or join the VKS cluster.
  • The /var/log/update-controller/sync.log file on the stuck node displays the following:

    Waiting for signed bootstrapper certificate at: /dev/shm/bootstrap/wcp_node_bootstrapper.crt

  • Additionally, the /var/log/vmware/svchost/stderr.log file on the healthy nodes contains the following error:

    POTENTIAL DEADLOCK: Duplicate locking, saw callers this locks in one goroutine.

Environment

vSphere Kubernetes Service 8.0 U3

Cause

WCP continuously attempts to process the certificate signing request (wcp_node_bootstrapper-<node>.csr) but fails to obtain the signed certificate because the svchost service on the existing healthy nodes has encountered a deadlock.

Resolution

  1. Log in to the three healthy Supervisor nodes via SSH.

  2. Restart the svchost service on all three healthy nodes by running the following command:

    systemctl restart svchost

  3. Verify the service is running correctly on each node:

    systemctl status svchost

  4. Monitor the deployment status of the stuck node. The upgrade process should automatically resume and complete successfully.

  5. If the node does not automatically recover after restarting the services, remove the failed node by deleting the associated ESX Agency Manager (EAM) agency, allowing WCP to reconcile and deploy a fresh node. Reference KB for steps to delete EAM agency for the failed node. Supervisor Cluster Stuck in Configuring State and the Control Plane marked as Orphaned in vSphere UI

Additional Information

Refer Known Issues (Supervisor stuck in configuring state after restore due to potential svchost issues) section in vSphere Supervisor 8.0 Release Notes