Old kube-vip pods get stuck in Terminating state after a TKGm cluster upgrade

Products

VMware Tanzu Kubernetes Grid

Issue/Introduction

After performing a TKGm upgrade, old kube-vip pods can get stuck in "Terminating" state. The new kube-vip pods are in "Running" state, however, and so there should not be any issues with the kube-vip functionality in the cluster. These stuck kube-vip pods could be seen in the Management or Workload clusters.

Environment

The issue has been seen in TKGm legacy clusters that have been upgraded to v2.3.x and/or v2.4.x.

Cause

The old kube-vip pods that are stuck in "Terminating" state can be seen from the pod list, along with the ones that are in "Running" state.

NAMESPACE            NAME                                              READY          STATUS             RESTARTS       AGE
kube-system          kube-vip-wkld-cluster-control-plane-rf7qs         1/1            Terminating        119 (10h ago)  54d
kube-system          kube-vip-wkld-cluster-control-plane-xjwtl         1/1            Running            0              10h

The nodes where these pods are scheduled are no longer existing.

The kube-controller manager pod logs show that the Pod Garbage Collection process is trying to force delete the pod but is failing with an error.

2024-05-28T18:11:19.922060944Z stderr F E0528 18:11:19.921924       1 gc_controller.go:156] failed to get node wkld-cluster-control-plane-rf7qs : node "wkld-cluster-control-plane-rf7qs" not found
…
2024-05-28T18:11:19.923973138Z stderr F I0528 18:11:19.923883       1 gc_controller.go:337] "PodGC is force deleting Pod" pod="kube-system/kube-vip-wkld-cluster-control-plane-rf7qs"
2024-05-28T18:11:19.925897899Z stderr F E0528 18:11:19.925822       1 gc_controller.go:256] failed to create manager for existing fields: failed to convert new object (kube-system/kube-vip-wkld-cluster-control-plane-rf7qs; /v1, Kind=Pod) to smd typed: errors:
2024-05-28T18:11:19.925910871Z stderr F   .spec.containers[name="kube-vip"].env: duplicate entries for key [name="cp_enable"]
2024-05-28T18:11:19.925914569Z stderr F   .spec.containers[name="kube-vip"].env: duplicate entries for key [name="cp_enable"]

The logs point to an issue with duplicate entries in .spec.containers[name="kube-vip"].env. The describe of the pod shows that the duplicate entries are there.

                        "env": [
                            {
                                "name": "cp_enable",
                                "value": "true"
                            },
 ...
                            {
                                "name": "cp_enable",
                                "value": "true"
                            },
                            {
                                "name": "cp_enable",
                                "value": "true"
                            }

The kube-vip manifest (/etc/kubernetes/manifests/kube-vip.yaml) in the Control Plane VM has the duplicate entries, which cause the kube-controller-manager to fail when deleting the old pods, hence they are stuck in "Terminating" state.

There is a bug in the tanzu cli that keeps adding the "cp_enable" key to the environment list without checking if it is there already.

Resolution

To manually clean up the old pods that are stuck in "Terminating" state, the following command can be ran on each of them.

kubectl -n kube-system delete pod <$kube-vip-pod-name> --grace-period=0 --force

To prevent this from happening in the next upgrade, you can ssh into each of the CP nodes and remove the duplicate entries ("cp_enable") from /etc/kubernetes/manifests/kube-vip.yaml. Upon saving that file with the changes, the kubelet process will automatically restart the kube-vip pod on the node without the duplicate env entry, and in this way you should not run into the issue when you upgrade again.

The bug in the tanzu cli has been fixed in TKGm v2.5.0.