Healthwatch Reports Intermittent CLI Health Failures (Push/Start/Logs)
search cancel

Healthwatch Reports Intermittent CLI Health Failures (Push/Start/Logs)

book

Article ID: 434933

calendar_today

Updated On:

Products

VMware Tanzu Platform - Cloud Foundry

Issue/Introduction

Healthwatch intermittently reports failures specifically for CLI Health (Push/Start/Logs), while other platform operations (like API responsiveness or router checks) remain successful. You may see alerts triggered by the pas-sli-test failing to complete within the expected timeout.

Cause

The CLI health check is executed by the pas-sli-test job within the Healthwatch Exporter tile.

Unlike simple API pings, a "Push/Start/Logs" check simulates a full application lifecycle. Failure in these specific areas—particularly during the Staging (STG) phase—typically points to a bottleneck in the Diego subsystem or its downstream dependencies, most likely the blobstore.

When the Cloud Controller (CC) or other control plane components are healthy, but staging is slow or failing, the system is likely struggling to upload/download droplets or buildpacks.

Resolution

To diagnose the bottleneck, you must review the lifecycle logs of the SLI application.

  1. Locate the Logs: Review the logs for the pas-sli-test within the Healthwatch Exporter tile or via the CLI if the test app is still present.

  2. Analyze Staging Timestamps: Look for the duration between the start of the droplet upload and the completion.

Example Log Analysis: In the snippet below, notice the 9-minute gap between the start of the upload and the "Uploading complete" message for a relatively small file (72.2MB).

2026-02-21T03:25:42.34+0000 [STG/0] OUT Uploading droplet...
2026-02-21T03:25:46.44+0000 [API/2] OUT Creating droplet for app with guid ####
...
2026-02-21T03:34:31.20+0000 [STG/0] OUT Uploaded droplet (72.2M)
2026-02-21T03:34:31.21+0000 [STG/0] OUT Uploading complete

Recommended Actions

  • Blobstore Health: Check the health and performance of your external storage (S3, GCS, Azure Storage) or internal WebDAV/NFS blobstore.

  • Network Latency: Investigate network throughput between the Diego Cells and the Blobstore endpoint.

  • Resource Contention: Check if the Blobstore is hitting rate limits or if the network is saturated by other concurrent staging jobs.