Inside the Supervisor or Guest Cluster, scaling down the deployment/daemonset doesn't delete the pods.

Products

VMware vSphere Kubernetes Service

Issue/Introduction

The user is unable to scale up and scale down a Deployment or a Daemonset inside the Supervisor or the Guest Cluster.
The task to scale down the replicas or scaling them up doesn't fail itself, however the expected changes aren't seen.

On scaling down, the current and desired state of the pods stays the same (in the ideal circumstances both the current and desired state should be 0 or the number to which the replicas have been scaled down). However, it is only the desired state which gets scaled down while the current state remains the same.

a. For example, the user scales down the capi-controller manager deployment to 0 as below.

root@<ID> [ ~ ]# kubectl get deployment capi-controller-manager -n svc-tkg-domain-c<ID>
NAME READY UP-TO-DATE AVAILABLE
capi-controller-manager 3/3 1 1
root@<ID> [ ~ ]# root@<ID> [ ~ ]# k get pods -A | grep -i capi-controller-manager NAMESPACE NAME READY STATUS
svc-tkg-domain-c<ID> capi-controller-manager-<ID1> 2/2 Running
svc-tkg-domain-c<ID> capi-controller-manager-<ID2> 2/2 Running
svc-tkg-domain-c<ID> capi-controller-manager-<ID3> 2/2 Running

root@<ID> [ ~ ]# kubectl scale deployment capi-controller-manager -n svc-tkg-domain-c<ID> --replicas=0
deployment/capi-controller-manager scaled
root@<ID> [ ~ ]#

b. To confirm the changes, the user runs the below command and notices what is exactly described earlier in the issue introduction. Only the desired state of the Replicaset changed to zero and the pods aren't deleted as well.

root@<ID> [ ~ ]# kubectl get deployment capi-controller-manager -n svc-tkg-domain-c<ID>
NAME READY UP-TO-DATE AVAILABLE
capi-controller-manager 3/0 1 1
root@<ID> [ ~ ]#

root@<ID> [ ~ ]# k get pods -A | grep -i capi
NAMESPACE NAME READY STATUS
svc-tkg-domain-c<ID> capi-controller-manager-<ID1> 2/2 Running
svc-tkg-domain-c<ID> capi-controller-manager-<ID2> 2/2 RUnning
svc-tkg-domain-c<ID> capi-controller-manager-<ID3> 2/2 Running
root@<ID> [ ~ ]#

c. In the ideal circumstances, below is how it should look like after the capi-controller-manager deployment is scaled down to zero.

root@<ID> [ ~ ]# kubectl get deployment capi-controller-manager -n svc-tkg-domain-c<ID>
NAME READY UP-TO-DATE AVAILABLE
capi-controller-manager 0/0 1 1
root@<ID> [ ~ ]#

root@<ID> [ ~ ]# kubectl get pods -n svc-tkg-domain-c<ID> | grep -i capi-controller-manager
root@<ID> [ ~ ]#
In addition to the same when a pod is deleted manually, it doesn't come back or gets recreated at all.

a. For example, you delete the capi-controller pod as below.
root@<ID> [ ~ ]# kubectl delete pod capi-controller-manager-<ID1> -n svc-tkg-domain-c<ID>
pod/capi-controller-manager-<ID1> deleted
root@<ID> [ ~ ]#

b. On running the command to check if the pod was recreated, you notice that it never did.
root@<ID> [ ~ ]# k get pods -A | grep -i capi
NAMESPACE NAME READY STATUS
svc-tkg-domain-c<ID> capi-controller-manager-<ID2> 2/2 Running
svc-tkg-domain-c<ID> capi-controller-manager-<ID3> 2/2 Running
root@<ID> [ ~ ]#

Environment

vSphere Kubernetes Service

Cause

The kubelet certificate inside the cluster is expired. This usually happens when the kubeadm utility is used to regenerate the Supervisor or Guest Cluster certificates and not the VMware/Broadcom certmgr utility. As a result, the kubelet service is neither re-creating new pods nor updating the desired state of the replicaset/daemonset correctly. The kubeadm utility only resets the core components certificates (kube-api server, etcd etc) and not the kubelet certificate.

Resolution

Regenerate/Renew the kubelet certificate of the affected cluster.

Kindly use the certmgr utility to perform the certificate regeneration as the same takes care of kubelet and all other core components certificates at the same time without having to regenerate each one separately.

The detailed instructions on how to replace VKS supervisor certificates can be found here- Replace vSphere with Tanzu Supervisor Certificates
For Guest cluster, the detailed instructions can be found here- Replace vSphere with Tanzu Guest Cluster/vSphere Kubernetes Cluster Certificates