When looking at the Healthwatch Dashboard, the graph "Number of Unresponsive Agents" lists all BOSH VMs as being in an unresponsive state. However, when reviewing the platform through the BOSH CLI, VMs are shown as running.
TPCF/TKGI
To check on this, you can perform the following steps:
You should see the following message:
NATS client error: nats: slow consumer, messages dropped
More details regarding this can be found in the nats documentation:
https://docs.nats.io/running-a-nats-service/nats_admin/slow_consumers
To resolve this, restart the health_monitor process with monit restart health_monitor . This will restart the nats connection that health monitor is using, and should allow the heartbeat messages from the VMs to reach the director.
You should then see the healthwatch graph display the correct number of unresponsive VMs.
If the issue persists after this, please contact Broadcom support.