How to shutdown and startup a Multi Master PKS cluster

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

This article describes how to safely shutdown and startup a Multi Master PKS cluster. The Multi Master feature was introduced in PKS version 1.2.0 and therefore this procedure supports any PKS release after v1.2.0.

The BOSH deployment stop command does not follow the sequence needed for Kubernetes to shutdown cleanly. It is intended that the Worker VMs are shut down first and then the Master VMs are shut down.

Environment

Cause

The Multi Master feature currently only supports 3 Master Nodes, this is due to the fact that etcd is currently co-located with the kube-apiserver process on the Master Virtual Machine (VM).

The BOSH deployment stop command does not follow the sequence needed for Kubernetes to shutdown cleanly. It is intended that the Worker VMs are shut down first and then the Master VMs are shut down.

Furthermore, to bring the etcd cluster back up after shutdown, you need a minimum of 2 etcd services running to establish quorum.

The following procedure has been created to meet these requirements.

Resolution

This procedure will allow you to successfully stop and start a Multi Master PKS Cluster:

Part 1: Shutdown the Multi Master PKS Cluster

1. Note the BOSH deployment name of your PKS cluster, it will be in the form of "service-instance_<cluster uuid>".

2. Get the BOSH VMs output for the PKS cluster deployment and confirm all services are running on each VM in the cluster.

bosh -d service-instance_xxxxxxxxxx is --ps

Note: If the cluster is not healthy it may cause issues after startup that are not related to this shutdown or startup procedure.

3. Stop the workers by executing the following command:

bosh -d service-instance_xxxxxxxxx stop worker

4. Stop the Masters by executing the following command:

bosh -d service-instance_xxxxxxxxxx stop master

5. Confirm the processes for all VMs are now showing unknown and the status of the VMs shows stopped.

bosh -d service-instance_xxxxxxxxxx is --ps

Part 2: Startup the Multi Master PKS Cluster

1. BOSH SSH to the first Master VM (master/0), sudo -i (switch to root user) and start the etcd service. Run monit summary to confirm the etcd service is now running and exit from the Master VM.

bosh -d service-instance_xxxxxxxxxx ssh master/0

sudo -i

monit start etcd

monit summary

exit

2. Start the next Master VM using the BOSH start command. This will bring all services up on Master index 1 VM.

bosh -d service-instance_xxxxxxxxxxxx start master/1

3. At this stage you will have the etcd service running on 2 Master VMs which means etcd has quorum once again. Run the following BOSH commands within the Master instance group to bring up the remaining services.

bosh -d service-instance_xxxxxxxxxx start master/2

Wait for master/2 to start.

bosh -d service-instance_xxxxxxxxxx ssh master/0 "sudo monit stop all"
bosh -d service-instance_xxxxxxxxxx start master/0

4. Confirm that all services for each master VM are now up and running.

bosh -d service-instance_xxxxxxxxxxx is --ps

5. Next start the worker VMs:

bosh -d service-instance_xxxxxxxxxxx start worker

6. The cluster should be back up and running now. Use BOSH is --ps to confirm all services are running for each VM.

bosh -d service-instance_xxxxxxxxxx is --ps

7. Confirm the componentstatus of Kubernetes shows all 3 etcd services are Healthy

kubectl get componentstatus