Alert TanzuSLOCFPushErrorBudget indicates how many minutes are left in the error budget before exceeding the selected Uptime SLO Target over the selected time period.
Our default alerting rule in our documentation for TanzuSLOCFPushErrorBudget checks if the alert is active for 15 minutes before firing the alert.
- name: TanzuApplicationServiceSLOs
rules:
- alert: TanzuSLOCFPushErrorBudget
expr: '( (1 - (rate(tas_sli_task_failures_total{task="push"}[28d]) / rate(tas_sli_task_runs_total{task="push"}[28d]) ) ) - 0.99) * (28 * 24 * 60) <= 0'
for: 15m
annotations:
summary: "The `cf_push` command is unresponsive"
description: |
This alert fires when the error budget reaches zero.
This commonly occurs when:
- Diego is under-scaled
- UAA is unresponsive
- Cloud Controller is unresponsive
Check the status of these components in order to diagnose the issue.
Healthwatch
Healthwatch pushes an app in System org, Healthwatch2 space to monitor for cf push health.
If the alert has been triggered, review the cloud_controllers, diego_cells, and uaa health and make sure there is enough resources available for the vms.