After regenerating expired bosh-dns leaf certificates, attempting to execute "bosh -d service_instance_xxx deploy manifest.yaml --skip-drain --fix
" to push the new certificates may result in the following error:
Error: Action Failed get_task: Task d4738ucb-50e8-4b8b-6f61-7838j93f36c result: Stopping Monitored Services: Stopping services '[kube-apiserver kube-controller-manager kube-scheduler bosh-dns bosh-dns-healthcheck ]' errored
The exact cause of this issue is unclear; however, it is possible that some services are in an error state due to expired certificates, preventing them from fully stopping, even though monit indicates that they have been stopped.
In this case, SSH into the node that failed. When you run the monit summary
command, you’ll notice that all services are in a 'not monitored' state. Try executing monit unmonitor all
, and once that’s completed, attempt to deploy the cluster again. This time, the upgrade will likely succeed. If necessary, repeat the steps until all VMs have been updated with the new certificates.