Postgres health warning triggers every 24 hours
search cancel

Postgres health warning triggers every 24 hours

book

Article ID: 377293

calendar_today

Updated On: 04-17-2025

Products

VMware vCenter Server

Issue/Introduction

  • Every day at around the same time, a postgres health warning will trigger.
  • postgres logging shows no errors or other issues to be concerned about, including the time of the event.
  • Rebooting vCenter brings the service back up in a healthy state, but it returns to "degraded" within 24 hours.
  • The vmon logging shows a warning indicating that the service health xml file is stale:

<timestamp> Wa(03) host-1961 <vmware-vpostgres> Service api-health command's stderr: Service health xml file is stale. Current time: 1710124, expiration time: 1706828. Treating service health state RED.
<timestamp> Wa(03)+ host-1961
<timestamp> Wa(03) host-1961 <vmware-vpostgres> Service api-health command's stderr: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><healthStatus schemaVersion="1.0" xmlns="http://www.vmware.com/cis/cm/common/jaxb/healthstatus"><status>GREEN</status><message messageKey="cis.vmware-vpostgres.health.healthy" defaultMessage="Service vmware-vpostgres is healthy."></message><expirationMonoSec>1706828</expirationMonoSec></healthStatus>
<timestamp> Wa(03) host-1961 <vmware-vpostgres> Health of service failed. Health data: {"localizable_msgs": [{"id": "com.vmware.vmon.svc_health_timeout", "default_message": "Service is in an unhealthy state.", "args": []}], "_service_name": "vmware-vpostgres", "_trigger_threaddump_on_failure": 0}
<timestamp> In(05) host-1961 <vmware-vpostgres> Recover from service api health check failure. Fail count 24913

  • After moving the file to a temporary directory, a new one is created but the timestamp is still stale.

Environment

vCenter Server Appliance

Cause

There are two health status worker configuration files:

/storage/db/vpostgres/health_status_worker.conf
/storage/db/vpostgres/health_status_worker.confe

One of those files, health_status_worker.confe, should not be present, and the other has an incorrect value set for the health_status_worker.naptime parameter.

Resolution

  • Create a temporary directory:
    mkdir /temp
  • Move the health_status_worker.confe file to the temporary directory:
    mv /storage/db/vpostgres/health_status_worker.confe /temp/
  • Change the health_status_worker.naptime value from its current value to 30 in the health_status_worker.conf file.
  • Restart all services and monitor.