This article describes how to safely shutdown and startup a VMware PKS cluster.
The BOSH deployment stop command does not follow the sequence needed for Kubernetes to shutdown cleanly. It is intended that the Worker VMs are shut down first and then the Master VMs are shut down.
This procedure will allow you to successfully stop and start a Kubernetes cluster in VMware PKS environment.
Shutdown the Kubernetes Cluster:
Note the BOSH deployment name of your PKS cluster, it will be in the form of "service-instance_<cluster uuid>".
Get the BOSH VMs output for the PKS cluster deployment and confirm all services are running on each VM in the cluster.
bosh -d service-instance_xxxxxxxxxx is --ps
Note: If the cluster is not healthy it may cause issues after startup that are not related to this shutdown or startup procedure.
Stop the workers by executing the following command:
bosh -d service-instance_xxxxxxxxx stop worker
Stop the Masters by executing the following command:
bosh -d service-instance_xxxxxxxxxx stop master
Confirm the processes for all VMs are now showing unknown and the status of the VMs shows stopped.
bosh -d service-instance_xxxxxxxxxx is –ps
Startup the Kubernetes Cluster:
BOSH SSH to the first Master VM (master/0), sudo -i (switch to root user) and start the etcd service. Run monit summary to confirm the etcd service is now running and exit from the Master VM.
bosh -d service-instance_xxxxxxxxxx ssh master/0
sudo -i
monit start etcd
monit summary
exit
Note: If you have a single master K8S Cluster, power on the master node and then proceed with worker nodes.
Start the next Master VM using the BOSH start command. This will bring all services up on Master index 1 VM.
bosh -d service-instance_xxxxxxxxxxxx start master/1
At this stage you will have the etcd service running on 2 Master VMs which means etcd has quorum once again. Run BOSH start against all Masters in the Master instance group to bring up the remaining services.
bosh -d service-instance_xxxxxxxxxx start master
Confirm that all services for each master VM are now up and running.
bosh -d service-instance_xxxxxxxxxxx is –ps
Next start the worker VMs:
bosh -d service-instance_xxxxxxxxxxx start worker
The cluster should be back up and running now. Use BOSH is --ps to confirm all services are running for each VM.
bosh -d service-instance_xxxxxxxxxx is –ps
Confirm the componentstatus of Kubernetes shows all 3 etcd services are Healthy.
kubectl get componentstatus