Workload Cluster Upgrade Stuck on vSphere Kubernetes Service (VKS) 3.3 and higher due to Pinniped-Concierge Pods

Products

VMware vSphere Kubernetes Service

Issue/Introduction

In a vSphere Supervisor environment, a workload cluster upgrade is stuck while draining the previous version's node because pinniped-concierge pods are not draining.

This issue occurs on a vSphere Supervisor cluster with VKS service 3.3 or higher.

While connected to the Supervisor cluster context, the following symptoms are observed:

One of the workload cluster's nodes is stuck in Deleting state on the previous VKR version:
```
kubectl get machines -n <workload cluster namespace>
```

While connected to the affected workload cluster's context, the following symptoms are observed:

The status of the Deleting node shows as Ready, Scheduling Disabled state on the previous VKR version:
In the below example, the new control plane node was created successfully but the old control plane node is stuck draining.
It is expected for the worker node to be on the previous VKR version until all control plane nodes are replaced with the desired VKR version:

kubectl get nodes

NAME                                 STATUS                     ROLES           AGE   VERSION
<worker node A>                      Ready                      <none>          ##h   <previous VKR version>
<old control plane node>             Ready,SchedulingDisabled   control-plane   ##h   <previous VKR version>
<new control plane node>             Ready                      control-plane   ##h   <desired VKR version>

A pod beginning with the prefix of 'pinniped-concierge-kube-cert-agent' recreates on the draining older node, preventing the node from being drained:
The below pinniped pods are an example and their names and namespace may vary by environment.

kubectl get pods -A -o wide | egrep -v "Run|Completed"

NAMESPACE                      NAME                                        READY   STATUS             RESTARTS      AGE     IP    NODE
pinniped-concierge             pinniped-concierge-<id>                     0/1     Pending            #             ##h    <IP>  <old node name>
pinniped-concierge             pinniped-concierge-kube-cert-agent-<id>     0/1     ImagePullBackOff   #            ##h    <IP>  <old node name>
vmware-system-tmc            cluster-auth-pinniped-kube-cert-agent-<id> 0/1 ContainerCreating   #            ##h    <IP>  <old node name>

Environment

vSphere Supervisor

VKS 3.3 and higher

Cause

Worker nodes do not begin upgrading until all control plane nodes are on the desired VKR version and the previous VKR version control plane nodes have been cleaned up by the system.

Manual deletions of nodes will not help the upgrade to proceed.

Starting in VKS 3.3 and higher, the behavior of draining nodes in a workload cluster has changed.

If a node does not drain within the workload cluster's configured node drain time-out, the upgrade will not continue.

A workload cluster with IDP enabled can result in the pinniped-concierge component from draining properly.

In this scenario, the pinniped-concierge-kube-cert-agent is unable to drain successfully and continues to restart on the draining node which causes the upgrade to become stuck.

Resolution

Workaround:

Create a MachineDrainRule in the affected workload cluster's namespace. This must be performed in the Supervisor Cluster context.

Connect into the affected workload cluster's context
Note down the namespace of the recreating pinniped pods in the workload cluster:
```
kubectl get pods -A | grep -i pinniped
```
Connect into the Supervisor Cluster context

Create a file with the below MachineDrainRule contents:

apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDrainRule
metadata:
  name: vks-pod-drain-skip-pinniped
spec:
  drain:
    behavior: Skip
  pods:
  - namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: <namespace from Step 2>
    selector:
      matchLabels:
        kube-cert-agent.pinniped.dev: v3

Apply the above YAML file into the workload cluster's namespace:

kubectl apply -f <MachineDrainRule.yaml> -n <workload cluster namespace>

Confirm that the machineDrainRule was created in the desired workload cluster namespace:
```
kubectl get machinedrainrule -n <workload cluster namespace>
```
It is expected that the older node will complete draining the stuck pinniped pods and the workload cluster upgrade will continue at this point.