Supervisor Cluster Upgrade Fails with to Upgrade an ESXi Node (Reboot Mid-Upgrade)
search cancel

Supervisor Cluster Upgrade Fails with to Upgrade an ESXi Node (Reboot Mid-Upgrade)

book

Article ID: 436682

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • After inititating a VKS supervisor upgrade, one of the nodes are rebooted. This causes the upgrade to fail. If you check the supervisor status with kubectl get nodes, the nodes will all have "Ready" status, but the rebooted node will be on the previous version (where as the other nodes will have the new version upgraded to). Additionally, the AGE column for the failure node will be much longer than the AGE value for the successfully upgraded nodes:

[user@supervisor~]$ kubectl get nodes
NAME                               STATUS               ROLES                  AGE    VERSION
################################   Ready                control-plane,master   92m    v1.29.7+vmware.wcp.1
################################   Ready                control-plane,master   108m   v1.29.7+vmware.wcp.1
################################   Ready                control-plane,master   79m    v1.29.7+vmware.wcp.1
node1.internal                     Ready                agent                  226d   v1.28.2-sph-#######     <---- AGE and VERSION are behind
node2.internal                     Ready                agent                  10m    v1.29.3-sph-####### 
node3.internal                     Ready                agent                  59m    v1.29.3-sph-#######

 

  • The Spherelet service is not running on the afflicted host:

[[email protected] :~] /etc/init.d/spherelet status
YYYY-MM-DD HH:MM:SS.### init.d/spherelet spherelet init script invoked via the following hierarchy
YYYY-MM-DD HH:MM:SS.### init.d/spherelet 2880521: -sh
YYYY-MM-DD HH:MM:SS.### init.d/spherelet 2880518: /bin/sshd-session -i -R
YYYY-MM-DD HH:MM:SS.### init.d/spherelet 2880494: /bin/sshd-session -i -R
YYYY-MM-DD HH:MM:SS.### init.d/spherelet 2098773: /usr/lib/vmware/busybox/bin/busybox inetd /var/run/inetd.conf
YYYY-MM-DD HH:MM:SS.### init.d/spherelet 2097869: /bin/init
YYYY-MM-DD HH:MM:SS.### init.d/spherelet Log fetcher support: True
YYYY-MM-DD HH:MM:SS.### init.d/spherelet Log fetcher size: 460
YYYY-MM-DD HH:MM:SS.### init.d/spherelet spherelet is not running
YYYY-MM-DD HH:MM:SS.### init.d/spherelet spherelet is not running

 

  • There is no data in the spherelet service's configuration directory on the host:

[[email protected]:/etc/vmware/spherelet] ls -ltr
-rw-r--r-T    1 root     root             0 MMM DD HH:MM spherelet.crt
-rw-r--r-T    1 root     root             0 MMM DD HH:MM spherelet.conf
-rw-r--r-T    1 root     root             0 MMM DD HH:MM server.key
-rw-r--r-T    1 root     root             0 MMM DD HH:MM kubelet-server-current.pem
-rw-r--r-T    1 root     root             0 MMM DD HH:MM kubelet-client-current.pem
-rw-r--r-T    1 root     root             0 MMM DD HH:MM client.key
-rw-r--r-T    1 root     root             0 MMM DD HH:MM client.crt

Environment

vSphere Kubernetes Service - VKS v1.28

Cause

Because the node was rebooted, the node fails to update spherelet to match the rest of the cluster. During the upgrade, new spherelet configurations are pushed to the nodes. If a node becomes available at that time, then the new configuration cannot be applied to the failure node.

Resolution

Contact Broadcom support for validation and resolution.