BOSH commands appear to hang indefinitely or timeout. Executing the following command reveals hundreds or thousands of scan and fix tasks:
bosh -e director tasks --no-filter
This type of scenario typically manifests itself during a BOSH deployment when many tasks are generated, while at the same time there is a BOSH agent that is intermittently skipping heartbeats.
Basically what can happen is the BOSH agent will miss a heartbeat and health monitor (which runs on BOSH Director) will trigger the creation of a scan and fix task. When scan and fix execute it finds that the bad agent has successfully sent a heartbeat and skips resurrection of the instance.
This will repeatedly happen hundreds of times causing the Director task queue to build up. And when there are many long-running deployment tasks executing this can cause a race condition where task queue grows too large.
Note- running bosh stop, start, restart, and recreate may result in undesirable behavior if deployment changes are in progress. When troubleshooting these types of issues it is best to avoid executing these commands. Instead, use the IAAS or BOSH CCK to engage these types of troubleshooting actions.
monit stop health_monitor
monit restart director
bosh -e director vms --details
WARN : (Resurrector) notifying director to recreate unresponsive VM: cf-300c3738aa8b3ad21fca router/07a82795-821f-4466-9509-f19ac2caf927