Attempting to push or restage an application with rolling stategy fails with error starting up app -
cf restage <app> --strategy rolling
Waiting for app to deploy...
Start app timeout
Use 'cf8 logs <app-name>-recent' for more information
FAILED
App state says running even though restage reports failed.
Check the resource utilization on clock global. Note the load average and CPU wait time -
clock_global/######-####-###-#####-######### running ###-####-# ###.###.#.## clock-global_cf-######-####-###-#####-######### medium.disk true bosh-vsphere-esxi-ubuntu-jammy-go_agent/1.775 Sun May 25 01:47:08 UTC 2025 81d 18h 15m 4s 6.06, 6.11, 6.10 - 3.4% 1.0% 95.2% 29% (1.1 GB) 0% (2.3 MB) 57% (23i%) 24% (3i%
In the example above, we see load averages of >6 (this number ideally should not exceed the CPU cores on VM) and CPU wait of 95.2% (>90% is critical threshold).
Clock Global VM runs a process that keeps Diego and Cloud Controller in sync. This can impact rolling restart because rolling restart waits to app instance to starts and therefore relies on freshness of Diego sync.
Reference KPI documentation: https://techdocs.broadcom.com/us/en/vmware-tanzu/platform/tanzu-platform-for-cloud-foundry/6-0/tpcf/monitoring-kpi.html
Solution -
Recreate the clock global VM that is overloaded
bosh -d <CF deployment> recreate clock_global/######-####-###-#####-#########
It's recommended to capture the logs from clock global before the recreate for root cause purposes.