Healthwatch Dashboard displays all VMs as unresponsive
search cancel

Healthwatch Dashboard displays all VMs as unresponsive

book

Article ID: 403605

calendar_today

Updated On:

Products

VMware Tanzu Platform - Cloud Foundry VMware Tanzu Application Service

Issue/Introduction

When looking at the Healthwatch Dashboard, the graph "Number of Unresponsive Agents" lists all BOSH VMs as being in an unresponsive state. However, when reviewing the platform through the BOSH CLI, VMs are shown as running.

Environment

TPCF/TKGI

Cause

To check on this, you can perform the following steps:

  1. SSH to the BOSH Director
  2. sudo -i
  3. cd /var/vcap/sys/log/health_monitor
  4. Review the health_monitor.log

You should see the following message:

NATS client error: nats: slow consumer, messages dropped

More details regarding this can be found in the nats documentation:

https://docs.nats.io/running-a-nats-service/nats_admin/slow_consumers

Resolution

To resolve this, restart the health_monitor process with monit restart health_monitor . This will restart the nats connection that health monitor is using, and should allow the heartbeat messages from the VMs to reach the director.

You should then see the healthwatch graph display the correct number of unresponsive VMs.

If the issue persists after this, please contact Broadcom support.