"Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status"
When the node status is queried with the command "kubectl get nodes -A" from the supervisor, the agent is reported as NotReady:
NAME STATUS ROLES AGE VERSION[Node-UUID] Ready master 3d10h v1.19.1+wcp.2[Node-UUID] Ready master 3d10h v1.19.1+wcp.2[Node-UUID] Ready master 3d10h v1.19.1+wcp.2[ESXi-Hostname] NotReady agent 3d10h v1.19.1-sph-496a80d[ESXi-Hostname] Ready agent 3d10h v1.19.1-sph-496a80d[ESXi-Hostname] NotReady agent 3d10h v1.19.1-sph-496a80d
/var/log/vmware/fluentbit/consolidated.log) displays the following errors:systemd.kubelet.service: {"hostname":"####-##-## ##:##:##(Node-UUID)####-##-## ##:##:##","unit":"kubelet","pid":"425","exe":"/opt/kubernetes/k8s-1.19/bin/kubelet","cmdline":"/usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --pod-infra-container-image=vmware/pause:1.19.0 --node-ip=<node_ip>","log":"E0108 remote_runtime.go:389] ExecSync'/bin/sh -c extender_reply=$(curl -k -s -o /dev/null -w %{http_code} https://##.##.##.##:12345/healthz); if [[ \"$extender_reply\" -lt 200 || \"$extender_reply\" -ge 400 ]]; then exit 1; fi; scheduler_healthy=false; for (( i=0; i<8; i++ )); do scheduler_reply=$(curl -k -s -o /dev/null -w %{http_code} http://:10251/healthz); if [[ \"$scheduler_reply\" -ge 200 && \"$scheduler_reply\" -lt 400 ]]; then scheduler_healthy=true; break; fi; sleep 10; done; if [[ \"$scheduler_healthy\" = false ]]; then exit 1; fi;' from runtime service failed: rpc error: code = Unknown desc = failed to exec in container: failed to start exec: OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused \"exec: \\\"/bin/sh\\\": stat /bin/sh: no such file or directory\": unknown"}]##.##.##.##
/var/log/vmware/wcp/wcpsvc.log shows the kubenotready error:vcenter.wcp.node.kubenotready","localized":{"OPTIONAL":"Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status.."},"params":{"OPTIONAL":null}}}}},"severity":"ERROR"}}},{"STRUCTURE":{"com.vmware.vcenter.namespace_management.clusters.message":{"details":{"OPTIONAL":{"STRUCTURE":{"com.vmware.vapi.std.localizable_message":{"args":["Kubelet stopped posting node status."],"default_message":"Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status..","id":"vcenter.wcp.node.kubenotready","localized":{"OPTIONAL":"Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status.."}[root@ESXi-Host:/var/log] /etc/init.d/spherelet status
####-##-## ##:##:##,303 init.d/spherelet Log fetcher support: True####-##-## ##:##:##,330 init.d/spherelet spherelet is not running####-##-## ##:##:##,330 init.d/spherelet spherelet is not running
VMware Kubernetes Service
This is due to failure in the OCI runtime execution, resulting from missing paths or service failures during process startup. Consequently, the Spherelet service is prevented from running correctly on the affected ESXi hosts.
Fixed in VMware vSphere 7.0 Update 2 or a later version.
Workaround:
spherelet service by running the command:/etc/init.d/spherelet statusspherelet service with the command: /etc/init.d/spherelet start