Healthwatch tile installation stuck executing the Push Monitoring Components (push-apps) errand
search cancel

Healthwatch tile installation stuck executing the Push Monitoring Components (push-apps) errand

book

Article ID: 293390

calendar_today

Updated On:

Products

Operations Manager

Issue/Introduction

When you deploy the Healthwatch tile, it gets stuck when the Push Monitoring Components (push-apps) errand is executed.

The output of Apply Changes may show something similar to the following:
===== 2020-05-30 17:16:11 UTC Running "/usr/local/bin/bosh --no-color --non-interactive --tty --environment=<Bosh-IP> --deployment=p-healthwatch-<guid> run-errand push-apps --instance healthwatch-forwarder/first"

...
...
 
############################################### Pushing Apps in Parallel ############################################### 
<No logs after this step>

Note: This output above will show up if you have tried cancelling the BOSH task and manually execute the script that push-apps errand initiates in the healthwatch-forwarder VM at this location, /var/vcap/jobs/push-apps/bin/run.

The run script executes a method called push_apps_in_paraller() from the main function to push Healthwatch monitoring components apps. For example, bosh-health-check, healthwatch-injestor, etc.
main() {
  ...
  ...
  push_apps_in_parallel
  run_after_deploy_migrations
}

The push_apps_in_parallel method looks like this:
push_apps_in_parallel() {
  print_centered "Pushing Apps in Parallel"
  apps=$(ruby "${scripts_dir}/app_helpers.rb" "${app_manifests}" get-apps)
  if ! parallel \
    --files \
    --halt soon,fail=3 \
    --joblog "${JOB_LOG}" \
    --jobs 5 \
    --no-notice \
    --results "${RESULTS_DIR}" \
    --retries 2 \
    deploy_app ::: "${apps}"
  then
    print_parallel_errors
    exit 1
  fi
    cf delete bosh-task-check -f -r
}

When this function is stuck, it logs the stderr and stdout in a temporary directory on the ephemeral storage under /var/vcap/data/push-apps/tmp. In this tmp directory, a sub-directory result-<NUM> gets created that contains a stderr file.

If you look at that file, it will show you the following error:
mkdir /1 failed to create (or no such file or directory)


Resolution

The RCA of this issue is unknown and it is specific to environments on Azure, with or without OMSAgent add-on deployed. 


Workaround

The article covers steps to successfully push the monitoring components apps to Apps Manager. To workaround this issue, the Push Monitoring Components errand can be disabled to complete the Health watch tile installation. 

1. BOSH SSH to the healthwatch-forwarder VM, which was picked up to run the push-apps errand.
bosh -e <env-name> -d <p-healthwatch-GUID> ssh healthwatch-forwarder/0

2. Change to root and navigate to the directory where the run script is present:
sudo -i
cd /var/vcap/jobs/push-apps/bin/

3. Edit the run script to reflect the following changes:
  • Add the following line between `RESULTS_DIR=$(mktemp -d -t results-XXX)`  and `JOB_LOG=${RESULTS_DIR}/job_log`
RESULTS_DIR=$(mktemp -d -t results-XXX)
mkdir -p "${RESULTS_DIR}/1" # Add this line <---
JOB_LOG=${RESULTS_DIR}/job_log
  • Comment out the push_apps_in_parallel() method and add the new code for the push_apps_in_paraller() method. Your run script must look like the following when making this change:
# push_apps_in_parallel() {
# print_centered "Pushing Apps in Parallel"
# apps=$(ruby "${scripts_dir}/app_helpers.rb" "${app_manifests}" get-apps)
#  if ! parallel \
#   --files \
#   --halt soon,fail=3 \
#    --joblog "${JOB_LOG}" \
#    --jobs 5 \
#    --no-notice \
#    --results "${RESULTS_DIR}" \
#    --retries 2 \
#    deploy_app ::: "${apps}"
#  then
#    print_parallel_errors
#    exit 1
# fi
#    cf delete bosh-task-check -f -r
# }

push_apps_in_parallel() {
  print_centered "Pushing Apps in Parallel"
  apps=$(ruby "${scripts_dir}/app_helpers.rb" "${app_manifests}" get-apps)
  for app in $apps; do
    echo "begin to deploy $app"
    deploy_app $app
    echo "done deploy $app"
  done
}
  • Save the changes and execute the script manually.
cd /var/vcap/jobs/push-apps/bin
./run
  • Wait for the script to complete. You can monitor uptime for Healthwatch monitoring apps in Apps Manager under system org and Healthwatch space. After successful execution, you can continue with running the next Apply Changes for the Healthwatch tile with the Push Monitoring Components errand disabled.