Healthwatch TSDB is in a failing state

search cancel

Healthwatch TSDB is in a failing state

book

Article ID: 433908

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

Healthwatch TSDB VM is in failing state, due to Prometheus job not starting.

prometheus.stderr.log shows an entry similar to:

/var/vcap/data/packages/ruby-3.2/[GENERIC_ID]/lib/ruby/3.2.0/psych/parser.rb:62:in `_native_parse': 
(/var/vcap/store/pks-cluster-discovery/scrape_configs.yml): found unexpected end of stream while 
scanning a quoted scalar at line 5758 column 15 (Psych::SyntaxError)

pks-cluster-discovery logs shows an entry similar to:

2026-03-18 12:06:54 INFO pks.ScrapeConfigGenerator [discover-clusters] Could not get scrape config for cluster [CLUSTER_ID_01]
2026-03-18 12:06:54 INFO pks.ScrapeConfigGenerator [discover-clusters] Could not get scrape config for cluster [CLUSTER_ID_02]
2026-03-18 12:06:54 INFO pks.PksClusterDiscovery [discover-clusters] Writing scrape configurations for 0 clusters

Environment

Healthwatch 2.x

Cause

The TSDB (Time Series Database) failure typically stems from a corruption or syntax error within the dynamically generated scrape_configs.yml file. When the pks-cluster-discovery process fails to properly fetch, format, or close a data string (such as a quoted scalar) during the configuration write-cycle, the resulting YAML file becomes malformed.

While the specific error—unexpected end of stream—is a common indicator of an incomplete configuration, this state can be triggered by various underlying synchronization issues or interrupted write operations. Because Prometheus requires a valid YAML structure to initialize its scraping engine, any syntax deviation in this shared configuration file will prevent the service from starting.

Resolution

To resolve this, you must manually clear the corrupted configuration file and restart the discovery services to regenerate a healthy YAML structure.

monit stop prometheus

echo $'---\n[]\n' > /var/vcap/store/pks-cluster-discovery/scrape_configs.yml

monit restart pks-cluster-discovery 
//wait for running status

monit start prometheus

Feedback

thumb_up Yes

thumb_down No