TanzuSLOCFPushErrorBudget alert in Healthwatch

search cancel

TanzuSLOCFPushErrorBudget alert in Healthwatch

book

Article ID: 404604

calendar_today

Updated On:

Products

VMware Tanzu Platform

Issue/Introduction

Alert TanzuSLOCFPushErrorBudget indicates how many minutes are left in the error budget before exceeding the selected Uptime SLO Target over the selected time period.

Our default alerting rule in our documentation for TanzuSLOCFPushErrorBudget checks if the alert is active for 15 minutes before firing the alert.

- name: TanzuApplicationServiceSLOs
    rules:
      - alert: TanzuSLOCFPushErrorBudget
        expr: '( (1 - (rate(tas_sli_task_failures_total{task="push"}[28d]) / rate(tas_sli_task_runs_total{task="push"}[28d]) ) ) - 0.99) * (28 * 24 * 60) <= 0'
        for: 15m
        annotations:
          summary: "The `cf_push` command is unresponsive"
          description: |
            This alert fires when the error budget reaches zero.

            This commonly occurs when:
            - Diego is under-scaled
            - UAA is unresponsive
            - Cloud Controller is unresponsive

            Check the status of these components in order to diagnose the issue.

Environment

Healthwatch

Cause

Healthwatch pushes an app in System org, Healthwatch2 space to monitor for cf push health.

Resolution

If the alert has been triggered, review the cloud_controllers, diego_cells, and uaa health and make sure there is enough resources available for the vms.

Feedback

thumb_up Yes

thumb_down No