vSphere Kubernetes Supervisor ESXi Host with Kubernetes status showing "Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status"
search cancel

vSphere Kubernetes Supervisor ESXi Host with Kubernetes status showing "Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status"

book

Article ID: 391456

calendar_today

Updated On: 03-26-2025

Products

VMware vSphere with Tanzu vSphere with Tanzu VMware vSphere 7.0 with Tanzu Tanzu Kubernetes Runtime

Issue/Introduction

In the vSphere web client, when viewing the Summary of an ESXi host in the cluster, the Kubernetes status shows an error message similar to the below:

  • Node is not healthy and is not accepting pods. Details Kubelet stopped posting node status
  • Node is not healthy and is not accepting pods. Details Kubelet never posted node status

 

While connected to the Supervisor cluster context, one or more of the following symptoms are observed:

  • Newly created pods are stuck in Pending state:
    • kubectl get pods -A | grep -v Run

  • Newly created nodes are stuck in Provisioning state:
    • kubectl get machines -n <cluster namespace>

  • The status of nodes in the cluster show Ready for the Supervisor control plane VMs, but NotReady for one or more ESXi hosts. The names of VMs and hosts will vary by environment:
    • kubectl get nodes

      NAME                      STATUS     ROLES                  AGE    VERSION
      <supervisor-dns-name-1>   Ready      control-plane,master   ###d   v1.##.#+vmware.wcp.#
      <supervisor-dns-name-2>   Ready      control-plane,master   ###d   v1.##.#+vmware.wcp.#
      <supervisor-dns-name-3>   Ready      control-plane,master   ###d   v1.##.#+vmware.wcp.#
      <esxi-hostname-a>         NotReady   agent                  ###d   v1.##.#-sph-a12b3c4
      <esxi-hostname-b>         NotReady   agent                  ###d   v1.##.#-sph-a12b3c4
      <esxi-hostname-c>        NotReady   agent                  ###d   v1.##.#-sph-a12b3c4

  • Describing a NotReady ESXi host shows Conditions with errors similar to the error messages from vSphere web client under Kubernetes status on the ESXi host:
    • NodeStatusUnknown Kubelet stopped posting node status.
    • NodeStatusNeverUpdated Kubelet never posted node status.

Environment

vSphere with Tanzu 7.0

vSphere with Tanzu 8.0

This issue can occur regardless of whether or not the environment is managed by Tanzu Mission Control (TMC)

Cause

The status of ESXi agents from kubectl get nodes in the Supervisor cluster context also tracks the status of the spherelet process on the ESXi host.

This may indicate an issue with spherelet on the NotReady ESXi host.

Spherelet requires that port connectivity over port 10250 is available on both eth0 and eth1 interfaces bidirectionally to and from the Supervisor cluster.

Resolution

The status of spherelet should be checked, including its certificates and port connectivity.

  1. SSH into the affected NotReady ESXi host

  2. Check if spherelet is running:
    • Note: Spherelet may be running, but not operating properly. It is best to check its logs accordingly.
    • /etc/init.d/spherelet status

      YYYY-MM-DD HH:MM:SS,sss init.d/spherelet spherelet is running
      YYYY-MM-DD HH:MM:SS,sss init.d/spherelet spherelet is running
  3. Spherelet logs can be checked regarding any error messages:
    • /var/log/spherelet.log
    • Error messages similar to the following may indicate that the spherelet certificates have expired:
      • failed to retrieve node: unauthorized
  4. Confirm that Spherelet certificates are not expired.
    • Spherelet certificates can be checked through querying the ESXi host through openssl where <my-esxi-host.domain.com> should be replaced with the ESXi's hostname and domain accordingly:
      • openssl s_client -connect <my-esxi-host.domain.com>:10250 | openssl x509 -noout -dates -fingerprint
    • Alternatively, the spherelet cert files can be checked directly through openssl:
      • openssl x509 -text -in /etc/vmware/spherelet/client.crt | grep Not openssl x509 -text -in /etc/vmware/spherelet/spherelet.crt | grep Not
    • Spherelet certificates can be renewed through the certmgr script run from the vCenter Server Appliance (VCSA) as per this KB: Replace vSphere with Tanzu Supervisor Certificates
      • The certmgr script renews both Supervisor cluster certificates and the spherelet certificates for all ESXi hosts in the cluster.
      • If the certmgr script fails to renew the spherelet certificates, please reach out to VMware by Broadcom Support for assistance.


  5. Firewall checks regarding spherelet can be checked through the below commands:
    • esxcli network firewall ruleset list | egrep "Name|spherelet"

      Name Enabled
      spherelet True
    • esxcli network firewall ruleset rule list |egrep "Ruleset|spherelet"

      Ruleset                        Direction  Protocol  Port Type  Port Begin  Port End
      spherelet                      Inbound    TCP       Dst             10250     10250
      spherelet                      Outbound   TCP       Dst                 0     65535
  6. To check port connectivity, SSH into one of the Supervisor cluster control plane VMs:
  7. Port connectivity should be checked over port 10250 bidirectionally between the Supervisor cluster control plane VMs and the ESXi host:
    • curl -v telnet://<ESXi_Management_IP>:10250 --interface eth0
    • curl -v telnet://<ESXi_Management_IP>:10250 --interface eth1
    • If either of the above curls fail, then the physical network should be checked to unblock port 10250