Supervisor Cluster Upgrade Fails with to Upgrade an ESXi Node (Reboot Mid-Upgrade)

search cancel

Supervisor Cluster Upgrade Fails with to Upgrade an ESXi Node (Reboot Mid-Upgrade)

book

Article ID: 436682

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

After inititating a VKS supervisor upgrade, one of the nodes are rebooted. This causes the upgrade to fail. If you check the supervisor status with kubectl get nodes, the nodes will all have "Ready" status, but the rebooted node will be on the previous version (where as the other nodes will have the new version upgraded to). Additionally, the AGE column for the failure node will be much longer than the AGE value for the successfully upgraded nodes:

[user@supervisor~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
################################ Ready control-plane,master 92m v1.29.7+vmware.wcp.1
################################ Ready control-plane,master 108m v1.29.7+vmware.wcp.1
################################ Ready control-plane,master 79m v1.29.7+vmware.wcp.1
node1.internal Ready agent 226d v1.28.2-sph-####### <---- AGE and VERSION are behind
node2.internal Ready agent 10m v1.29.3-sph-#######
node3.internal Ready agent 59m v1.29.3-sph-#######

The Spherelet service is not running on the afflicted host:

[[email protected] :~] /etc/init.d/spherelet status
YYYY-MM-DD HH:MM:SS.### init.d/spherelet spherelet init script invoked via the following hierarchy
YYYY-MM-DD HH:MM:SS.### init.d/spherelet 2880521: -sh
YYYY-MM-DD HH:MM:SS.### init.d/spherelet 2880518: /bin/sshd-session -i -R
YYYY-MM-DD HH:MM:SS.### init.d/spherelet 2880494: /bin/sshd-session -i -R
YYYY-MM-DD HH:MM:SS.### init.d/spherelet 2098773: /usr/lib/vmware/busybox/bin/busybox inetd /var/run/inetd.conf
YYYY-MM-DD HH:MM:SS.### init.d/spherelet 2097869: /bin/init
YYYY-MM-DD HH:MM:SS.### init.d/spherelet Log fetcher support: True
YYYY-MM-DD HH:MM:SS.### init.d/spherelet Log fetcher size: 460
YYYY-MM-DD HH:MM:SS.### init.d/spherelet spherelet is not running
YYYY-MM-DD HH:MM:SS.### init.d/spherelet spherelet is not running

There is no data in the spherelet service's configuration directory on the host:

[[email protected]:/etc/vmware/spherelet] ls -ltr
-rw-r--r-T 1 root root 0 MMM DD HH:MM spherelet.crt
-rw-r--r-T 1 root root 0 MMM DD HH:MM spherelet.conf
-rw-r--r-T 1 root root 0 MMM DD HH:MM server.key
-rw-r--r-T 1 root root 0 MMM DD HH:MM kubelet-server-current.pem
-rw-r--r-T 1 root root 0 MMM DD HH:MM kubelet-client-current.pem
-rw-r--r-T 1 root root 0 MMM DD HH:MM client.key
-rw-r--r-T 1 root root 0 MMM DD HH:MM client.crt

Environment

vSphere Kubernetes Service - VKS v1.28

Cause

Because the node was rebooted, the node fails to update spherelet to match the rest of the cluster. During the upgrade, new spherelet configurations are pushed to the nodes. If a node becomes available at that time, then the new configuration cannot be applied to the failure node.

Resolution

Contact Broadcom support for validation and resolution.

Feedback

thumb_up Yes

thumb_down No