Healthwatch stops working throwing 503 errors

search cancel

Healthwatch stops working throwing 503 errors

book

Article ID: 370722

calendar_today

Updated On: 08-29-2024

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Stemcell upgrade in Healthwatch deployment can fail at the smoke-test with following "server_error: server error: 503" error

The corresponding error in grafana.log in grafana VM is:

logger=tsdb.prometheus t=2023-07-06T10:28:57.501798562Z level=error msg="Instant query failed" query=increase(tkgi_sli_failures_total[10m]) err="execution: server_error: server error: 503"

From bosh view all VM's are up and running.

Cause

This happens because of corruption of wal files.

Resolution

To get back to a healthy status follow these steps

Ssh to one of the tsdb VM's as root
Run "monit stop prometheus"
Delete all files (not folders) from this directories: /var/vcap/store/prometheus/chunks_head/ and in /var/vcap/store/prometheus/wal.
Repeat the same for all TSDB VM's.
Run "monit start prometheus" on all TSDB VMs.

Feedback

Was this article helpful?

thumb_up Yes

thumb_down No