Supervisor Cluster's Host Config status stuck in "Configuring" at "Installed and Started Kubernetes Node Agent on the ESXi Host".
book
Article ID: 412096
calendar_today
Updated On:
Products
VMware vSphere Kubernetes Service
Issue/Introduction
The Supervisor Cluster's configuration status shows "Running" however the Host Config status is stuck in "Configuring" at "Installed and Started Kubernetes Node Agent on the ESXi Host". Below screenshot confirms how the problematic scenario looks like.
Per /var/log/vmware/wcp/wcpsvc.log in vCenter Server Appliance, the WCP service complains of not being able to find the desired Spherelet version inside the catalog.
error wcp [content/catalog.go] [opID=<ID>-host-<ID>-vLCM:Enable:domain-c<ID>] catalog is not ready error wcp [kubelifecycle/kube_instance.go] [opID=<ID>-host-<ID>-vLCM:Enable:domain-c<ID>] unable to find desired version <spherelet-version> image info. err: supervisor content is being processed error wcp [kubelifecycle/kube_instance.go] [opID=<ID>-host-<ID>-vLCM:Enable:domain-c<ID>] Unable to find image info of desired version: <spherelet-version> error: supervisor content is being processed error wcp [kubelifecycle/pman_client.go] [opID=<ID>-host-<ID>-vLCM:Enable:domain-c<ID>] unable to import depot and set solution err: supervisor content is being processed
Despite WCP service's failure to install Spherelet on to the ESXi Hosts, it still confirms the overall supervisor cluster enablement task to be successful.
info wcp [kubelifecycle/controller.go] [opID=<ID>] Successfully enabled Supervisor >ID> debug wcp [kubelifecycle/controller.go] [opID=<ID>] Supervisor Apply Done. [logger/trace.go] [opID=<ID>] [ END ] [kubelifecycle.(*Controller).syncKubeInstanceState] [<time in sec>] supervisor=<ID>
In addition to the same, under "Supervisor Management > Content distribution" in the vSphere UI, there is an error "Error processing Content Library".
Error processing Content Library <ID>, Error: malformed Supervisor OVF template name ob-<template ID and version>.
Below screenshot confirms how the error looks like.
Environment
vSphere Kubernetes Service VMware Cloud Foundation 9.x
Cause
An invalid Content Library is assigned/associated to the Supervisor.
The deployment of the Control Plane Virtual Machines goes through because the Supervisor image comes from the embedded release. However, the Spherelet installation is dependent on the catalog and because the catalog isn't ready, the WCP service on the vCenter Server couldn't push the Spherelet VIB on to the ESXi Hosts to admit them as Kubernetes Workers.
Resolution
The workarounds mentioned below should let the Spherelet installation go through which in turn will help the newly installed and started Spherelet service to confirm its readiness to the WCP allowing the supervisor to complete its configuration.
Workaround-1
Login to the vCenter Server command line. Use SSH to connect to the shell or log in to the shell directly as the root user.
Start the interactive DCLI shell using the command below. >dcli +interactive
Use the below command to obtain the ID of the content library associated with the Supervisor Cluster. dcli> com vmware vcenter namespacemanagement lifecycle content libraries list
Now use the below command to unassign the content library. This should fix the content catalog issue and the Spherelet installation should go through. dcli> com vmware vcenter namespacemanagement lifecycle content libraries unassign --library <library_id_obtained_from_above>
Workaround-2
Login to the vCenter Server command line. Use SSH to connect to the shell or log in to the shell directly as the root user.
Navigate to /storage/updatemgr/patch-store/hostupdate/vmw/vib20/spherelet.
Copy the Spherelet VIB of the correct version to the ESXi Host manually. The most well-known way to do it is via SCP.
Once the VIB is copied to the ESXi Host, manually install the Spherelet VIB using the below command. esxcli software vib install -v <location where the Spherelet VIB was copied>