Healthwatch stops working throwing 503 errors
book
Article ID: 370722
calendar_today
Updated On: 08-29-2024
Products
VMware Tanzu Application Service for VMs
Issue/Introduction
Stemcell upgrade in Healthwatch deployment can fail at the smoke-test with following "server_error: server error: 503" error
The corresponding error in grafana.log in grafana VM is:
logger=tsdb.prometheus t=2023-07-06T10:28:57.501798562Z level=error msg="Instant query failed" query=increase(tkgi_sli_failures_total[10m]) err="execution: server_error: server error: 503"
From bosh view all VM's are up and running.
Cause
This happens because of corruption of wal files.
Resolution
To get back to a healthy status follow these steps
- Ssh to one of the tsdb VM's as root
- Run "monit stop prometheus"
- Delete all files (not folders) from this directories: /var/vcap/store/prometheus/chunks_head/ and in /var/vcap/store/prometheus/wal.
- Repeat the same for all TSDB VM's.
- Run "monit start prometheus" on all TSDB VMs.
Feedback
Was this article helpful?
thumb_up
Yes
thumb_down
No