Healthwatch TSDB VM is in failing state, due to Prometheus job not starting.
prometheus.stderr.log shows an entry similar to:
/var/vcap/data/packages/ruby-3.2/[GENERIC_ID]/lib/ruby/3.2.0/psych/parser.rb:62:in `_native_parse':
(/var/vcap/store/pks-cluster-discovery/scrape_configs.yml): found unexpected end of stream while
scanning a quoted scalar at line 5758 column 15 (Psych::SyntaxError)
pks-cluster-discovery logs shows an entry similar to:
2026-03-18 12:06:54 INFO pks.ScrapeConfigGenerator [discover-clusters] Could not get scrape config for cluster [CLUSTER_ID_01]
2026-03-18 12:06:54 INFO pks.ScrapeConfigGenerator [discover-clusters] Could not get scrape config for cluster [CLUSTER_ID_02]
2026-03-18 12:06:54 INFO pks.PksClusterDiscovery [discover-clusters] Writing scrape configurations for 0 clusters
Healthwatch 2.x
The TSDB (Time Series Database) failure typically stems from a corruption or syntax error within the dynamically generated scrape_configs.yml file. When the pks-cluster-discovery process fails to properly fetch, format, or close a data string (such as a quoted scalar) during the configuration write-cycle, the resulting YAML file becomes malformed.
While the specific error—unexpected end of stream—is a common indicator of an incomplete configuration, this state can be triggered by various underlying synchronization issues or interrupted write operations. Because Prometheus requires a valid YAML structure to initialize its scraping engine, any syntax deviation in this shared configuration file will prevent the service from starting.
To resolve this, you must manually clear the corrupted configuration file and restart the discovery services to regenerate a healthy YAML structure.
monit stop prometheus
echo $'---\n[]\n' > /var/vcap/store/pks-cluster-discovery/scrape_configs.yml
monit restart pks-cluster-discovery
//wait for running status
monit start prometheus