Inside the Supervisor or Guest Cluster, scaling down the deployment/daemonset doesn't delete the pods.
search cancel

Inside the Supervisor or Guest Cluster, scaling down the deployment/daemonset doesn't delete the pods.

book

Article ID: 409566

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • The user is unable to scale up and scale down a Deployment or a Daemonset inside the Supervisor or the Guest Cluster.

  • The task to scale down the replicas or scaling them up doesn't fail itself, however the expected changes aren't seen.
  • On scaling down, the current and desired state of the pods stays the same (in the ideal circumstances both the current and desired state should be 0 or the number to which the replicas have been scaled down). However, it is only the desired state which gets scaled down while the current state remains the same.

        a. For example, the user scales down the capi-controller manager deployment to 0 as below.

            root@<ID> [ ~ ]# kubectl get deployment capi-controller-manager -n svc-tkg-domain-c<ID>
            NAME                                  READY    UP-TO-DATE   AVAILABLE 
            capi-controller-manager          3/3             1                      1    
            root@<ID> [ ~ ]#

        root@<ID> [ ~ ]# k get pods -A | grep -i capi-controller-manager
            NAMESPACE                                           NAME                                                           READY   STATUS   
            svc-tkg-domain-c<ID>                           capi-controller-manager-<ID1>                          2/2     Running                
            svc-tkg-domain-c<ID>                           capi-controller-manager-<ID2>                          2/2     Running                
            svc-tkg-domain-c<ID>                           capi-controller-manager-<ID3>                          2/2     Running   

            root@<ID> [ ~ ]# kubectl scale deployment capi-controller-manager -n svc-tkg-domain-c<ID> --replicas=0
            deployment/capi-controller-manager scaled
       
    root@<ID>  [  ~    ]#

        b. To confirm the changes, the user runs the below command and notices what is exactly described earlier in the issue introduction. Only the desired state of                       the Replicaset changed to zero and the pods aren't deleted as well.

            root@<ID> [ ~ ]# kubectl get deployment capi-controller-manager -n svc-tkg-domain-c<ID>
     
            NAME                                READY   UP-TO-DATE   AVAILABLE 
            capi-controller-manager      3/0           1                      1  
        root@<ID> [ ~ ]# 

            root@<ID> [ ~ ]# k get pods -A | grep -i capi

            NAMESPACE                                           NAME                                                           READY     STATUS
            svc-tkg-domain-c<ID>                           capi-controller-manager-<ID1>                          2/2          Running                

            svc-tkg-domain-c<ID>                           capi-controller-manager-<ID2>                          2/2         RUnning                
            svc-tkg-domain-c<ID>                           capi-controller-manager-<ID3>                          2/2         Running  
       
    root@<ID> [  ~  ]#

         c. In the ideal circumstances, below is how it should look like after the capi-controller-manager deployment is scaled down to zero.

            root@<ID> [ ~ ]# kubectl get deployment capi-controller-manager -n svc-tkg-domain-c<ID>
            NAME                                READY   UP-TO-DATE   AVAILABLE 
            capi-controller-manager      0/0           1                      1  
        root@<ID> [ ~ ]# 

        root@<ID> [ ~ ]# kubectl get pods -n svc-tkg-domain-c<ID> | grep -i capi-controller-manager
        root@<ID> [ ~ ]#

  • In addition to the same when a pod is deleted manually, it doesn't come back or gets recreated at all. 

        a. For example, you delete the capi-controller pod as below.
            root@<ID> [ ~ ]# kubectl delete pod capi-controller-manager-<ID1> -n svc-tkg-domain-c<ID>
            pod/capi-controller-manager-<ID1> deleted
       
    root@<ID> [ ~ ]#

        b. On running the command to check if the pod was recreated, you notice that it never did.
            root@<ID> [ ~ ]# k get pods -A | grep -i capi
            NAMESPACE                                           NAME                                              READY      STATUS
            svc-tkg-domain-c<ID>                           capi-controller-manager-<ID2>              2/2          Running                

            svc-tkg-domain-c<ID>                           capi-controller-manager-<ID3>              2/2          Running 
            root@<ID> [ ~ ]#

Environment

vSphere Kubernetes Service

Cause

The kubelet certificate inside the cluster is expired. This usually happens when the kubeadm utility is used to regenerate the Supervisor or Guest Cluster certificates and not the VMware/Broadcom certmgr utility. As a result, the kubelet service is neither re-creating new pods nor updating the desired state of the replicaset/daemonset correctly. The kubeadm utility only resets the core components certificates (kube-api server, etcd etc) and not the kubelet certificate.

Resolution

Regenerate/Renew the kubelet certificate of the affected cluster.

Kindly use the certmgr utility to perform the certificate regeneration as the same takes care of kubelet and all other core components certificates at the same time without having to regenerate each one separately.

The detailed instructions on how to replace VKS supervisor certificates can be found here- Replace vSphere with Tanzu Supervisor Certificates
For Guest cluster, the detailed instructions can be found here- Replace vSphere with Tanzu Guest Cluster/vSphere Kubernetes Cluster Certificates