Bosh deployment fails with the error "Stopping Monitored Services: Stopping services '[bosh-dns bosh-dns-healthcheck etc]' errored"
search cancel

Bosh deployment fails with the error "Stopping Monitored Services: Stopping services '[bosh-dns bosh-dns-healthcheck etc]' errored"

book

Article ID: 385640

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Bosh deployment fails with the following error.

Error: Action Failed get_task: Task d4738ucb-50e8-4b8b-6f61-7838j93f36c result: Stopping Monitored Services: Stopping services '[kube-apiserver kube-controller-manager kube-scheduler bosh-dns bosh-dns-healthcheck ]' errored

The job names specified in the error could vary and be any of the jobs running in any of the VM's in the deployment.

Environment

VMware Tanzu Kubernetes Grid Integrated Edition

Cause

The exact cause of this issue is unclear, but the error indicates that the bosh agent fails to fully stop the job(s) mentioned in the error even though the output of "monit summary" could indicate that they have been stopped.

Resolution

  1. SSH into the particular VM that failed.
  2. Run "monit summary" and review the output.
  3. If the jobs are in "not monitored" state, then run the command "monit unmonitor all".  If there are no jobs in the output at all, then that means that there are no jobs to stop and re-deploying should not encounter the initial error at all.  The problematic VM was probably attempted to be recreated but failed, hence no jobs were installed.
  4. Retry the bosh deployment by running the original command that failed.  The original command could be a "bosh deploy", or "tkgi upgrade-cluster", or "tkgi update-cluster" or even Apply Changes from the Ops Manager.  This time, the deployment should succeed or at least get past the particular error.  If the error is encountered in other VM's too after the deployment was retried, then repeat the steps until all the VMs have been updated and the deployment is completed successfully.