Postgres health warning triggers every 24 hours

search cancel

Postgres health warning triggers every 24 hours

book

Article ID: 377293

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Every day at around the same time, a postgres health warning will trigger.
postgres logging shows no errors or other issues to be concerned about, including the time of the event.
Rebooting vCenter brings the service back up in a healthy state, but it returns to "degraded" within 24 hours.
The vmon logging shows a warning indicating that the service health xml file is stale:

<timestamp> Wa(03) host-1961 <vmware-vpostgres> Service api-health command's stderr: Service health xml file is stale. Current time: 1710124, expiration time: 1706828. Treating service health state RED.
<timestamp> Wa(03)+ host-1961
<timestamp> Wa(03) host-1961 <vmware-vpostgres> Service api-health command's stderr: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><healthStatus schemaVersion="1.0" xmlns="http://www.vmware.com/cis/cm/common/jaxb/healthstatus"><status>GREEN</status><message messageKey="cis.vmware-vpostgres.health.healthy" defaultMessage="Service vmware-vpostgres is healthy."></message><expirationMonoSec>1706828</expirationMonoSec></healthStatus>
<timestamp> Wa(03) host-1961 <vmware-vpostgres> Health of service failed. Health data: {"localizable_msgs": [{"id": "com.vmware.vmon.svc_health_timeout", "default_message": "Service is in an unhealthy state.", "args": []}], "_service_name": "vmware-vpostgres", "_trigger_threaddump_on_failure": 0}
<timestamp> In(05) host-1961 <vmware-vpostgres> Recover from service api health check failure. Fail count 24913

After moving the file to a temporary directory, a new one is created but the timestamp is still stale.

Environment

vCenter Server Appliance

Cause

There are two health status worker configuration files:

/storage/db/vpostgres/health_status_worker.conf
/storage/db/vpostgres/health_status_worker.confe

One of those files, health_status_worker.confe, should not be present, and the other has an incorrect value set for the health_status_worker.naptime parameter.

Resolution

Create a temporary directory:
mkdir /temp

Move the health_status_worker.confe file to the temporary directory:
mv /storage/db/vpostgres/health_status_worker.confe /temp/

Change the health_status_worker.naptime value from its current value to 30 in the health_status_worker.conf file.

Restart all services and monitor.

Feedback

thumb_up Yes

thumb_down No