Stemcell upgrade in Healthwatch deployment can fail at the smoke-test with following "server_error: server error: 503" error
The corresponding error in grafana.log in grafana VM is:
logger=tsdb.prometheus t=2023-07-06T10:28:57.501798562Z level=error msg="Instant query failed" query=increase(tkgi_sli_failures_total[10m]) err="execution: server_error: server error: 503"
From bosh view all VM's are up and running.
This happens because of corruption of wal files.
To get back to a healthy status follow these steps