K8S worker node hangs indefinitely during the PKS Upgrade
search cancel

K8S worker node hangs indefinitely during the PKS Upgrade

book

Article ID: 345551

calendar_today

Updated On:

Products

VMware Cloud PKS

Issue/Introduction

Symptoms:
  • During the Running errand Upgrade all clusters errand for Pivotal Container Service execution step of the PKS upgrade processs (all versions) the verbose output in Ops Manager shows the following for a long time (hours) with no progress:

    2019-03-01 14:41:50 UTC Running "/usr/local/bin/bosh --no-color --non-interactive --tty --environment=10.193.90.11 --deployment=pivotal-container-service-8bc65453a0a0c8a92afe run-errand upgrade-all-service-instances"
    Using environment '10.193.90.11' as client 'ops_manager'
    Using deployment 'pivotal-container-service-8bc65453a0a0c8a92afe'

    Task 12565

    Task 12565 | 14:41:51 | Preparing deployment: Preparing deployment (00:00:01)
    Task 12565 | 14:41:52 | Running errand: pivotal-container-service/d5bfedbe-ae18-4f39-98f0-4b4c94550979

  • The bosh tasks output will show two tasks running. The first task is the parent upgrade-all-services-instances errand and the second task is an additional task to create deployment for the service instance deployment of the PKS cluster.

    ubuntu@Ops-man-2-3-7:~$ bosh tasks
    Using environment '10.193.90.11' as client 'ops_manager'

    ID     State       Started At                    Last Activity At              User                                            Deployment                                             Description                                                                                              Result
    12566  processing  Fri Mar  1 14:41:54 UTC 2019  Fri Mar  1 14:41:54 UTC 2019  pivotal-container-service-8bc65453a0a0c8a92afe  service-instance_4b8ad40a-6c1a-4a22-9c3c-1330422ddb81  create deployment                                                                                        -
    12565  processing  Fri Mar  1 14:41:51 UTC 2019  Fri Mar  1 14:41:51 UTC 2019  ops_manager                                     pivotal-container-service-8bc65453a0a0c8a92afe         run errand upgrade-all-service-instances from deployment pivotal-container-service-8bc65453a0a0c8a92afe  -

    2 tasks

  • The BOSH task output will show the redeployment of the service instance of the PKS cluster. In the output you will see that a worker node is hung with Updating worker instance.

    ubuntu@Ops-man-2-3-7:~$ bosh task 12566
    Using environment '10.193.90.11' as client 'ops_manager'

    Task 12566

    Task 12566 | 14:41:55 | Preparing deployment: Preparing deployment
    Task 12566 | 14:41:56 | Warning: DNS address not available for the link provider instance: pivotal-container-service/d5bfedbe-ae18-4f39-98f0-4b4c94550979
    Task 12566 | 14:41:57 | Warning: DNS address not available for the link provider instance: pivotal-container-service/d5bfedbe-ae18-4f39-98f0-4b4c94550979
    Task 12566 | 14:41:57 | Warning: DNS address not available for the link provider instance: pivotal-container-service/d5bfedbe-ae18-4f39-98f0-4b4c94550979
    Task 12566 | 14:42:08 | Preparing deployment: Preparing deployment (00:00:13)
    Task 12566 | 14:43:08 | Preparing package compilation: Finding packages to compile (00:00:00)
    Task 12566 | 14:43:08 | Updating instance master: master/71697842-061b-450a-ac0b-73f04012a22a (0) (canary) (00:01:20)
    Task 12566 | 14:44:28 | Updating instance master: master/4902c248-fd28-4129-8bca-5094c423fc73 (2) (00:01:09)
    Task 12566 | 14:45:37 | Updating instance master: master/de2fda96-f396-4f8d-8bcb-c40306d4d88e (1) (00:01:18)
    Task 12566 | 14:46:55 | Updating instance worker: worker/dfa10f94-e690-4249-8463-dc7d9fc3efe6 (0) (canary) (00:01:25)
    Task 12566 | 14:48:20 | Updating instance worker: worker/39449084-e393-4fc4-a7b5-1ba613227012 (3)

  • If you BOSH SSH to the worker node, and check the kubelet drain logs, you can confirm that the kubelet is unable to drain or evict a pod from the worker node.

    bosh -d service-instance_4b8ad40a-6c1a-4a22-9c3c-1330422ddb81 ssh worker/39449084-e393-4fc4-a7b5-1ba613227012

    worker/39449084-e393-4fc4-a7b5-1ba613227012:~$ sudo -i
    worker/39449084-e393-4fc4-a7b5-1ba613227012:~# ps -ef | grep drain
    root     18428   724  0 14:48 ?        00:00:00 bash /var/vcap/jobs/kubelet/bin/drain job_changed hash_changed docker
    root     18476 18428  0 14:48 ?        00:00:00 kubectl --kubeconfig /var/vcap/jobs/kubelet/config/kubeconfig-drain drain -l bosh.id=39449084-e393-4fc4-a7b5-1ba613227012 --grace-period 10 --force --delete-local-data --ignore-daemonsets


    worker/39449084-e393-4fc4-a7b5-1ba613227012:/var/vcap/sys/log/kubelet# ls -l drain.stderr.log
    -rw-r--r-- 1 root root 65019 Mar  1 15:24 drain.stderr.log
    worker/39449084-e393-4fc4-a7b5-1ba613227012:/var/vcap/sys/log/kubelet# tail -f drain.stderr.log
    error when evicting pod "nginx-9cbcd98fd-lb7hj" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
    error when evicting pod "nginx-9cbcd98fd-lb7hj" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
    error when evicting pod "nginx-9cbcd98fd-lb7hj" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget

  • You can also use --debug to get more information about the task that is running:
    bosh task <task number> --debug |grep INFO

If the details of the offending pod (workload) is not clear from the drain.stderr.log, then these steps can be run to replicate the drain command and retrieve more information:

  1. The following command will highlight the IPs of the Kubernetes nodes:
    kubectl get nodes -o wide
    NAME                                   STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP    OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
    6d038635-17b5-419b-b94c-f0a72525c66b   Ready    <none>   4d    v1.12.4   10.193.90.98   10.193.90.98   Ubuntu 16.04.5 LTS   4.15.0-42-generic   docker://18.6.1
    77c0e5c7-546b-46a7-8e47-8908687980f5   Ready    <none>   4d    v1.12.4   10.193.90.95   10.193.90.95   Ubuntu 16.04.5 LTS   4.15.0-42-generic   docker://18.6.1
    7e3ecf57-f2ff-4f69-b503-346cc5c93cea   Ready    <none>   4d    v1.12.4   10.193.90.97   10.193.90.97   Ubuntu 16.04.5 LTS   4.15.0-42-generic   docker://18.6.1
    9a107784-4ed5-4557-b6e3-4b43515341b5   Ready    <none>   4d    v1.12.4   10.193.90.96   10.193.90.96   Ubuntu 16.04.5 LTS   4.15.0-42-generic   docker://18.6.1
    d5bfc2b0-223f-4cdd-a173-f1acba6fd07a   Ready    <none>   4d    v1.12.4   10.193.90.94   10.193.90.94   Ubuntu 16.04.5 LTS   4.15.0-42-generic   docker://18.6.1

  2. Find the IP that matches the IP of the work node that is hung:
    bosh -d service-instance_4b8ad40a-6c1a-4a22-9c3c-1330422ddb81 vms

    Deployment 'service-instance_4b8ad40a-6c1a-4a22-9c3c-1330422ddb81'

    Instance                                     Process State  AZ   IPs           VM CID                                   VM Type      Active
    master/4902c248-fd28-4129-8bca-5094c423fc73  running        az1  10.193.90.92  vm-07eb1792-ec2d-44c6-a3a1-ee2c1a98f514  medium.disk  true
    master/71697842-061b-450a-ac0b-73f04012a22a  running        az1  10.193.90.91  vm-cd2918ea-8701-4d2a-87d8-2170f31cf144  medium.disk  true
    master/de2fda96-f396-4f8d-8bcb-c40306d4d88e  running        az1  10.193.90.93  vm-2cdb8541-61c0-4c80-8ae4-97251a1a98fc  medium.disk  true
    worker/39449084-e393-4fc4-a7b5-1ba613227012  running        az1  10.193.90.95  vm-cb68be3e-cf1f-406e-b786-ad2f31f67937  medium.disk  true
    worker/622db2c3-3c01-4ddd-84a3-9e702dc34e54  running        az1  10.193.90.96  vm-fc6b2f3b-b9fc-42f8-be3d-306a0029aa55  medium.disk  true
    worker/b026c929-6054-477d-a049-de24ecca0d76  running        az1  10.193.90.97  vm-1ca53832-54bf-4a7e-ac27-20f64ebb3be1  medium.disk  true
    worker/cae662e4-6380-4126-a981-b1f0e5837952  running        az1  10.193.90.98  vm-1771a9ab-3104-4c4b-b04c-033a8b6ada42  medium.disk  true
    worker/dfa10f94-e690-4249-8463-dc7d9fc3efe6  running        az1  10.193.90.94  vm-31ef83bd-0ea7-476c-8911-659dd3c584ce  medium.disk  true

  3. Run the drain command directly from kubectl:
    kubectl drain 77c0e5c7-546b-46a7-8e47-8908687980f5  --grace-period 10 --force --delete-local-data --ignore-daemonsets

    node/77c0e5c7-546b-46a7-8e47-8908687980f5 already cordoned
    WARNING: Ignoring DaemonSet-managed pods: fluent-bit-mbvtg
    error when evicting pod "nginx-9cbcd98fd-lb7hj" (will retry after 5s): Cannot evict pod as it

  4. The output confirms the pod, nginx-9cbcd98fd-lb7hj, cannot evict the pod as it would violate the pod's disruption budget.



Environment

VMware PKS 1.x

Cause

During the PKS tile upgrade process, worker nodes are cordoned and drained.This drain is dependent on Kubernetes being able to unschedule all pods. If Kubernetes is unable to unschedule a pod, then the drain hangs indefinitely.
One reason why Kubernetes may be unable to unschedule the node is if the PodDisruptionBudget object has been configured in a way that allows 0 disruptions and only a single instance of the pod has been scheduled.

An Application Owner can create a PodDisruptionBudget object (PDB) for each application. A PDB limits the number pods of a replicated application that are down simultaneously from voluntary disruptions. This is a known issue as the Kubernetes PDB can conflict with a PKS upgrade and prevent the kubelet job from being drained.

Resolution

First see if the PDB can be changed or even deleted to allow the upgrade to continue. If this does not resolve the issue, the following are some possible workarounds:

  1. Configure .spec.replicas to be greater than the PodDisruptionBudget object. When the number of replicas configured in .spec.replicas is greater than the number of replicas set in the PodDisruptionBudget object, disruptions can occur.
  2. For more information, see How Disruption Budgets Work in the Kubernetes documentation. For more information about workload capacity and uptime requirements in PKS, see Prepare to Upgrade in Upgrading PKS.