Error "Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status" due to spherelet service failure on ESXi hosts in vSphere 7.0

search cancel

Error "Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status" due to spherelet service failure on ESXi hosts in vSphere 7.0

book

Article ID: 332570

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

The Web GUI displays the following error indicating that the Spherelet and Kubelet services are not functioning correctly on the ESXi hosts:

"Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status"

When the node status is queried with the command "kubectl get nodes -A" from the supervisor, the agent is reported as NotReady:
NAME STATUS ROLES AGE VERSION
[Node-UUID] Ready master 3d10h v1.19.1+wcp.2
[Node-UUID] Ready master 3d10h v1.19.1+wcp.2
[Node-UUID] Ready master 3d10h v1.19.1+wcp.2
[ESXi-Hostname] NotReady agent 3d10h v1.19.1-sph-496a80d
[ESXi-Hostname] Ready agent 3d10h v1.19.1-sph-496a80d
[ESXi-Hostname] NotReady agent 3d10h v1.19.1-sph-496a80d
Log evidence from the API server (/var/log/vmware/fluentbit/consolidated.log) displays the following errors:

systemd.kubelet.service: {"hostname":"####-##-## ##:##:##(Node-UUID)####-##-## ##:##:##","unit":"kubelet","pid":"425","exe":"/opt/kubernetes/k8s-1.19/bin/kubelet","cmdline":"/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --pod-infra-container-image=vmware/pause:1.19.0 --node-ip=<node_ip>","log":"E0108 remote_runtime.go:389] ExecSync'/bin/sh -c extender_reply=$(curl -k -s -o /dev/null -w %{http_code} https://##.##.##.##:12345/healthz); if [[ \"$extender_reply\" -lt 200 || \"$extender_reply\" -ge 400 ]]; then exit 1; fi; scheduler_healthy=false; for (( i=0; i<8; i++ )); do scheduler_reply=$(curl -k -s -o /dev/null -w %{http_code} http://##.##.##.##:10251/healthz); if [[ \"$scheduler_reply\" -ge 200 && \"$scheduler_reply\" -lt 400 ]]; then scheduler_healthy=true; break; fi; sleep 10; done; if [[ \"$scheduler_healthy\" = false ]]; then exit 1; fi;' from runtime service failed: rpc error: code = Unknown desc = failed to exec in container: failed to start exec: OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused \"exec: \\\"/bin/sh\\\": stat /bin/sh: no such file or directory\": unknown"}]

Log evidence in /var/log/vmware/wcp/wcpsvc.log shows the kubenotready error:

vcenter.wcp.node.kubenotready","localized":{"OPTIONAL":"Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status.."},"params":{"OPTIONAL":null}}}}},"severity":"ERROR"}}},
{"STRUCTURE":{"com.vmware.vcenter.namespace_management.clusters.message":{"details":{"OPTIONAL":{"STRUCTURE":{"com.vmware.vapi.std.localizable_message":{"args":["Kubelet stopped posting node status."],"default_message":"Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status..","id":"vcenter.wcp.node.kubenotready","localized":{"OPTIONAL":"Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status.."}

The service status on the host indicates it is down:

[root@ESXi-Host:/var/log] /etc/init.d/spherelet status

####-##-## ##:##:##,303 init.d/spherelet Log fetcher support: True
####-##-## ##:##:##,330 init.d/spherelet spherelet is not running
####-##-## ##:##:##,330 init.d/spherelet spherelet is not running

Environment

VMware Kubernetes Service

Cause

This is due to failure in the OCI runtime execution, resulting from missing paths or service failures during process startup. Consequently, the Spherelet service is prevented from running correctly on the affected ESXi hosts.

Resolution

Fixed in VMware vSphere 7.0 Update 2 or a later version.

Workaround:

Access the affected ESXi host via a secure shell (SSH) or console.
Verify the status of the spherelet service by running the command:

/etc/init.d/spherelet status
If the service is reported as not running, start the spherelet service with the command:

/etc/init.d/spherelet start

Feedback

thumb_up Yes

thumb_down No