Failed to auto-rotate the certificates for Spherelet- Alert on the ESXi hosts
search cancel

Failed to auto-rotate the certificates for Spherelet- Alert on the ESXi hosts

book

Article ID: 430958

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere Kubernetes Service VMware vCenter Server 8.0

Issue/Introduction

ESXi continuously attempt to rotate spherelet certificates and reports " Failed to auto-rotate the certificates for Spherelet" 

Spherelet began to get "Unauthorized" responses repeatedly when trying to rotate its certificates

/var/log/vmware/wcp/wcpsvc.log from vCenter 

YYYY-DD-MMTHH:MM:SS warning wcp [kubelifecycle/node_controller.go:###] Unhandled node condition CertRotationFailed encountered.
YYYY-DD-MMTHH:MM:SS warning wcp [kubelifecycle/node_controller.go:###] Unhandled node condition CertRotationFailed encountered.
YYYY-DD-MMTHH:MM:SS warning wcp [kubelifecycle/node_controller.go:###] Unhandled node condition CertRotationFailed encountered.
YYYY-DD-MMTHH:MM:SS warning wcp [kubelifecycle/node_controller.go:###] Unhandled node condition CertRotationFailed encountered.

/var/log/pods/cert-manager-controller from supervisor

YYYY-DD-MMTHH:MM:SS stderr F E0101 15:42:04.118425 1 leaderelection.go:330] error retrieving resource lock vmware-system-cert-manager/cert-manager-controller: Unauthorized
YYYY-DD-MMTHH:MM:SS stderr F E0101 15:42:38.116808 1 leaderelection.go:330] error retrieving resource lock vmware-system-cert-manager/cert-manager-controller: Unauthorized
YYYY-DD-MMTHH:MM:SS stderr F E0101 15:43:12.116062 1 leaderelection.go:330] error retrieving resource lock vmware-system-cert-manager/cert-manager-controller: Unauthorized

/var/log/spherelet.log from ESXi

YYYY-DD-MMTHH:MM:SS No(##) spherelet[#######]: E1104 07:30:50.040029 ####### certificate_manager.go:###] kubernetes.io/kube-apiserver-client-kubelet: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Unauthorized
YYYY-DD-MMTHH:MM:SS No(##) spherelet[#######]: E1104 07:30:50.040114 ####### certificate_manager.go:###] kubernetes.io/kube-apiserver-client-kubelet: Reached backoff limit, still unable to rotate certs: timed out waiting for the condition

Expired client.crt & server.crt (bootstrap certificates). However, the kubelet-client-current.pem shows valid

Example:

Bootstrap certificate:

# openssl x509 -text -in /etc/vmware/spherelet/client.crt | grep Not
        Not Before: Month date HH:MM:SS 2024 GMT
        Not After : Month date HH:MM:SS 2025 GMT
# openssl x509 -text -in /etc/vmware/spherelet/spherelet.crt | grep Not
        Not Before: Month date HH:MM:SS 2024 GMT
        Not After : Month date HH:MM:SS 2025 GMT

Current certificate:

# openssl x509 -text -in /etc/vmware/spherelet/kubelet-client-current.pem | grep Not
      Not Before: Month date HH:MM:SS 2025 GMT
      Not After : Month date HH:MM:SS 2026 GMT

Note this problem can only occurs on a host that does not undergo some modification or interruption during the 1.5 year period.

Any of the below actions (that result in a supervisor restart of Spherelet) will reset the timeline to Day 0 as the bootstrap certificates will be refreshed:

  1. a host remove/add from a cluster
  2. an upgrade
  3. entering maintenance mode

Environment

vSphere with Tanzu

VMware ESXi

VMware vCenter Server 8.0 U3

Cause

The UNAUTHORIZED occurred due to an incorrect usage of the expired client bootstrap certificate during the auto rotation effort.

Spherelet is using client config with expired bootstrap certs instead of current cert resulting in unauthorised errors

# cat /etc/vmware/spherelet/spherelet.conf | grep client-certificate
client-certificate: /etc/vmware/spherelet/client.crt

The problem occurs once the bootstrap client certificate issued to Spherelet expires. Once it has expired Spherelet would no longer able to properly rotate its client certificate and thus would eventually be consuming an expired current client certificate.

Resolution

VMware by Broadcom is aware of the issue and the fix will be included in future version.

Workaround:

Manually rotate certs using certmgr tool from KB Replace vSphere with Tanzu / vSphere Kubernetes Service Supervisor Certificates

certmgr certificates rotate --spherelet-only

Incase of any issues during manual certificate rotation, please reach out Broadcom Support Team for further assistance.

Additional Information

Key Considerations

Expired Certificates: If the current Spherelet certificate expires, the ESXi host will appear as "Not Ready" in vCenter, and the "Host Config" will be stuck in "Configuring".

Time Sensitivity: Spherelet certificates are highly sensitive to time differences. If the ESXi host's time is not synced (NTP) with the vCenter/Supervisor Cluster, the certificates may not be valid.