ESXI host appearing as Not Ready within the Supervisor Cluster after an FQDN change of the Host
search cancel

ESXI host appearing as Not Ready within the Supervisor Cluster after an FQDN change of the Host

book

Article ID: 382732

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service vSphere with Tanzu

Issue/Introduction

  • ESXi host is in Not Ready state on the Supervisor Cluster after an FQDN change of the host. The state of Supervisor Cluster will be stuck in Configuring state and following error message is received on the UI: "A general system error occurred. Error message: context deadline exceeded"

 

  • ESXi host will appear in "Not Ready" state on the Workload Platform Management UI

 

 

  • On vCenter Server you will see the below log snippets under /var/log/vmware/wcp/wcpsvc.log 

YYYY-MM-DDTHH:MM:SSZ error wcp [kubelifecycle/node_controller.go:1125] [opID=e1xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx456-host-<MOID>] Intent nodeReadyIntent, step configureKubeNode for supervisor e1xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx456 node host-<MOID> returned error context deadline exceeded
YYYY-MM-DDTHH:MM:SSZ error wcp [kubelifecycle/node_controller.go:474] [opID=e1xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx456-host-<MOID>] Failed to realize node {nodeID:host-<MOID> supervisorID:e1xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx456} state. Err context deadline exceeded. Will retry.
YYYY-MM-DDTHH:MM:SSZ debug wcp [kubelifecycle/kube_instance.go:5515] [opID=e1xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx456] Cluster is not ready yet, would retry in 1m0s time.

YYYY-MM-DDTHH:MM:SSZ debug wcp [kubelifecycle/node_controller.go:967] [opID=e1xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx456-host-<MOID>] For node host-<MOID>, setting step from configureSphereletService to startSphereletService
YYYY-MM-DDTHH:MM:SSZ debug wcp [kubelifecycle/node_controller.go:1182] [opID=e1xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx456-host-<MOID>] Supervisor e1xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx456 node host-<MOID> current step startSphereletService
YYYY-MM-DDTHH:MM:SSZ debug wcp [kubelifecycle/node_controller.go:1196] [-host-<MOID>] Updating node operation with intent [nodeReadyIntent] step [startSphereletService] progress 70%
YYYY-MM-DDTHH:MM:SSZ info wcp [kubelifecycle/spherelet.go:685] [-host-<MOID>] Invoking spherelet startSpherelet on host host-<MOID>
YYYY-MM-DDTHH:MM:SSZ info wcp [] W1018 17:48:58.895623  262578 reflector.go:539] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list tanzukubernetesclusters: failed to list tanzukubernetesclusters: the server was unable to return a response in the time allotted, but may still be processing the request
YYYY-MM-DDTHH:MM:SSZ info wcp [] I1018 17:48:58.895744  262578 trace.go:236] Trace[1898918294]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229 (18-Oct-2024 17:47:58.894) (total time: 60001ms):
YYYY-MM-DDTHH:MM:SSZ info wcp [] Trace[1898918294]: ---"Objects listed" error:failed to list tanzukubernetesclusters: the server was unable to return a response in the time allotted, but may still be processing the request 60001ms (17:48:58.895)
YYYY-MM-DDTHH:MM:SSZ info wcp [] Trace[1898918294]: [1m0.001346649s] [1m0.001346649s] END
YYYY-MM-DDTHH:MM:SSZ info wcp [kubelifecycle/spherelet.go:701] [opID=e1xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx456-host-<MOID>] spherelet startSpherelet successful on host host-<MOID>
YYYY-MM-DDTHH:MM:SSZ debug wcp [kubelifecycle/node_controller.go:967] [opID=e1xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx456-host-<MOID>] For node host-<MOID>, setting step from startSphereletService to configureKubeNode

 

  • On ESXi host you will see below log snippets under /var/run/log/spherelet.log

YYYY-MM-DDTHH:MM:SSZ No(5) spherelet[2100219]: time="YYYY-MM-DDTHH:MM:SSZ" level=info msg="Using cert/key pair from (/etc/vmware/spherelet/client.crt, /etc/vmware/spherelet/client.key)."
YYYY-MM-DDTHH:MM:SSZ No(5) spherelet[2100219]: time="YYYY-MM-DDTHH:MM:SSZ" level=info msg="Starting client certificate rotation"
YYYY-MM-DDTHH:MM:SSZ No(5) spherelet[2100219]: time="YYYY-MM-DDTHH:MM:SSZ" level=info msg="Waiting for informer caches to sync..."
YYYY-MM-DDTHH:MM:SSZ No(5) spherelet[2100219]: time="YYYY-MM-DDTHH:MM:SSZ" level=info msg="Informer caches populated"
YYYY-MM-DDTHH:MM:SSZ No(5) spherelet[2100219]: time="YYYY-MM-DDTHH:MM:SSZ" level=error msg="Failed to create govmomi client POST \"/sdk\": 503 Service Unavailable"
YYYY-MM-DDTHH:MM:SSZ No(5) spherelet[2100219]: time="YYYY-MM-DDTHH:MM:SSZ" level=error msg="Failed to create govmomi client POST \"/sdk\": 503 Service Unavailable"
YYYY-MM-DDTHH:MM:SSZ No(5) spherelet[2100219]: time="YYYY-MM-DDTHH:MM:SSZ" level=error msg="Failed to create govmomi client POST \"/sdk\": 503 Service Unavailable"
YYYY-MM-DDTHH:MM:SSZ No(5) spherelet[2100219]: time="YYYY-MM-DDTHH:MM:SSZ" level=info msg="Setting up Net-Op netconf provider"

YYYY-MM-SSTHH:MM:SSZ No(5) spherelet[2309340]: time="YYYY-MM-SSTHH:MM:SSZ" level=info msg="Started vds proxy"
YYYY-MM-SSTHH:MM:SSZ No(5) spherelet[2309340]: time="YYYY-MM-SSTHH:MM:SSZ" level=info msg="Adding node condition 'Ready=False'"
YYYY-MM-SSTHH:MM:SSZ No(5) spherelet[2309340]: time="YYYY-MM-SSTHH:MM:SSZ" level=fatal msg="nodes \"<ESXi_Hostname>\" is forbidden: node \"<ESXi_Hostname>\" is not allowed to modify node \"<ESXi_Hostname>\""

 

  • On ESXi host spherelet service would be in "Not Running" state and will keep on crashing after attempting it to start. 


Environment

VMware vSphere with Tanzu 7.0 

VMware vSphere with Tanzu 8.0 

Cause

The hostname of the ESXi host was not matching with the CN name of the spherelet client certificate. In the above case there was an extra '.' present in the hostname of the ESXi host which was not matching to the CN name of the spherelet's client certificate.

Resolution

  1. Open an SSH session to the ESXI host and verify the hostname
      • hostname -f

    • In the above case, the hostname listed out was "esxi01.domain.local."
      • [root@esxi01:~] hostname -f
      • esxi01.domain.local.

  2. Check the CN name present within the spherelet's client certificate

    • [root@esxi01:~] cat /etc/vmware/spherelet/client.crt | openssl x509 -text -noout | grep CN
      Issuer: CN=kubernetes
      Subject: C=US, ST=CA, L=Palo Alto, O=system:nodes, CN=system:node:esxi01.domain.local

    • Note : Make sure from the Subject, CN is identical to hostname of the ESXI host. As we can see, there is an extra '.' present in the hostname hence the hostname would need to changed.

  3. Follow the below steps to change the hostname of the ESXi host

    • Put the ESXi host into Maintenance Mode.

    • Run the following command to change the hostname to match it with the CN name of the spherelet's client certificate:
      • esxcli system hostname set -f esxi01.domain.local
      • Note: Verify and change the hostname as per the environment.

    • Reboot the ESXi host.
      • [root@esxi01:~] reboot

  4. After the reboot, hostname of the ESXi host should match with the CN name of the spherelet's client certificate.

  5. Try starting the 'spherelet' service on the ESXi host:
      • [root@esxi01:~] /etc/init.d/spherelet start

  6. Verify that the status of the spherelet service.
      • [root@esxi04:~] /etc/init.d/spherelet status

  7. Put the ESXI host out of Maintenance Mode and validate that the ESXI host is in Ready state now.

 

 

 

 

Additional Information

  1. In some, cases 'spherelet' service on the host might still fail to start. Verify if the spherelet vib is present on the ESXi host.

    • Expected output:
      • [root@esxi01:~] esxcli software vib list | grep spherelet
        spherelet                      1.3.4-20538948                         VMware  VMwareCertified   2024-03-05

  2. If the vib is not present, we will need to copy the vib from vCenter Server to the affected ESXi host and then install it.

    • Open an SSH to the vCenter Server and go to the following directory to list out the 'spherelet' vib:
      • cd /storage/updatemgr/patch-store/hostupdate/vmw/vib20/spherelet 

    • List out the vib present within the directory with 'ls -l' command:
      • ls -l
        -rw-r--r-- 1 updatemgr updatemgr 64246194 Feb 14  2023 VMware_bootbank_spherelet_1.2.1-19406725.vib
        -rw-r--r-- 1 updatemgr updatemgr 62717504 Feb 14  2023 VMware_bootbank_spherelet_1.3.0-19406701.vib
        -rw-r--r-- 1 updatemgr updatemgr 64654080 Feb 14  2023 VMware_bootbank_spherelet_1.3.4-20538948.vib

  3. Copy the vib to the affected ESXi host:
     
    • scp VMware_bootbank_spherelet_1.3.4-20538948.vib [email protected]:/tmp/
      • Note: Change the ESXi hostname as per the environment. Upon the prompt, enter the 'root' account password for ESXi host.

  4. Open an SSH session to the ESXi host, and install the spherelet vib:
    • esxcli software vib install /tmp/VMware_bootbank_spherelet_1.3.4-20538948.vib 

  5. Start the 'spherelet' service again and check the ESXI host status on the Supervisor Cluster:
    • [root@esxi01:~] /etc/init.d/spherelet start
  •