You may observe vCenter Server performance degradation caused by continuous high CPU usage and memory consumption by the vmware-vpostgres service. The following symptoms are commonly reported:
vCenter Server runs out of memory and may crash.
The vmware-vpostgres service consistently consumes excessive memory and CPU (near 100%).
From /var/log/vmware/vpostgres/postgresql.log, we see that there are too may SQL commands are being executed and time to process those query was taking too much time. ie "duration: 11050849.640 ms"
YYYY-MM-DD hh:mm:ss.162 UTC 6835b87c. 7842 0 VCDB vc [local] 30786 11 LOG: duration: 11050849.640 ms statement: BEGIN; declare "SQL_CUR0x7fd67c0ad150" cursor with hold for SELECT sc.stat_id, d.device name FROM vpx stat counter sc, vpx snple_timel sm, vpx_device d, vpx_hist_statl st WHERE sc.entity_id = 1 AND sm.sample_time <= '2025-05-27 13:05:00.312' :: timestamp GROUP BY sc.stat_id, d.device_name ORDER BY sc.stat_id, d.device_name ;fetch 1024 in "SQL_CUR0x7fd67c0ad150"YYYY-MM-DD hh:mm:ss.177 UTC 6835b87c.7842 0 VCDB vc [local] 30786 12 LOG: duration: 8450342.711 ms plan:Query Text: BEGIN; declare "SQL_CUR0x7fd67c0ad150" cursor with hold for SELECT sc.stat_id, d.device_name FROM vpx_stat_counter sc, vpx_sample_timel sm, vpx_device d, vpx_hist_statl st WHERE sc.entity_id = 1_id, 1) = coalesce (d.device_id, 1) and sc.counter_id = st.counter_id and st.time_id = sm.time_id.device_name ORDER BY sc.stat_id, d.device_name ;fetch 1024 in "SQL_CUR0x7fd67c0ad150" Group (cost=62935302.98 .. 3034711441.99 rows=3839808 width=164) (actual rows=342 loops=1)and coalesce (sc.device id, 1) = coalesce (d.device_id, 1)and sc.counter id = st.counter_idand st.time id = sm. time id AND sm. sample time > '1970-01-01 00:00:00: timestamp AND sm. sample_time > '1970-01-01 00:00:00' :: timestamp and coalesce (sc.deviAND sm. sample_time <= '2025-05-27 13:05:00.312'
YYYY-MM-DD hh:mm:ss. UTC 6834aa41.c577 0 50551 16685 LOG: Memory check: flag = rss, mem_used_bytes = 2124564 kB, mem_avail = 2413568 kBFrom /var/log/vmware/vpxd/vpxd-profiler.log ,we see that too many session came from session "52ce18fc-7357-XXXX-2e30-beXXXXXX8cf3" for property collection and those property collection was in the form of DB query "SELECT sc.stat_id, d.device_name FROM vpx_stat_counter sc, vpx_sample_time1 sm, vpx_device d, vpx_hist_stat1 st WHERE sc.entity_id = ? and coalesce(sc.device_id,1) = coalesce(d.device_id,1) and sc.counter_id = st.counter_id and st.time_id = sm.time"
--> ThreadState/ThreadId/55745/State/Task::lro-6105574::PerfMgr::vim.PerformanceManager.queryAvailableMetric::52ce18fc-7357-XXXX-2e30-beXXXXXX8cf3(52927915-XXXX-1b0b-XXXX-05c776434074)/State/Vdb/State/ SELECT sc.stat_id, d.device_name FROM vpx_stat_counter sc, vpx_sample_time1 sm, vpx_device d, vpx_hist_stat1 st WHERE sc.entity_id = ? and coalesce(sc.device_id,1) = coalesce(d.device_id,1) and sc.counter_id = st.counter_id and st.time_id = sm.time--> ThreadState/ThreadId/55746/State/Task::lro-3723516::PerfMgr::vim.PerformanceManager.queryAvailableMetric::52ce18fc-7357-XXXX-2e30-beXXXXXX8cf3(52927915-XXXX-1b0b-XXXX-05c776434074)/State/Vdb/State/ SELECT sc.stat_id, d.device_name FROM vpx_stat_counter sc, vpx_sample_time1 sm, vpx_device d, vpx_hist_stat1 st WHERE sc.entity_id = ? and coalesce(sc.device_id,1) = coalesce(d.device_id,1) and sc.counter_id = st.counter_id and st.time_id = sm.time
grep "mem_used_bytes" postgresql.log on the /var/log/vmware/vpostgres/postgresql.log file when you will see entries similar to below : YYYY-MM-DD hh:mm:ss. UTC 66731273.2e19 0 VCDB vc [local] 11801 8 DETAIL: Process holding the lock: 11683. Wait queue: 13029, 42605, 7358, 11801.YYYY-MM-DD hh:mm:ss. UTC 66731280.32f7 0 VCDB vc [local] 13047 4 DETAIL: Process holding the lock: 11683. Wait queue: 13029, 42605, 7358, 11801, 13047.YYYY-MM-DD hh:mm:ss. UTC 66731271.2d47 0 VCDB vc [local] 11591 4 DETAIL: Process holding the lock: 11683. Wait queue: 13029, 42605, 7358, 11801, 13047, 11591.
When checked the session we found that monitoring too having IP XX.XX.XX.XX has requested xxxxx amount of query
grep -R "52ce18fc-7357-XXXX-2e30-beXXXXXX8cf3" vpxd-profiler* | grep "ClientIP" | less
vpxd-profiler-1406.log:--> /SessionStats/SessionPool/Session/Id='52ce18fc-7357-XXXX-2e30-beXXXXXX8cf3'/Username='VSPHERE.LOCAL\<User_name>'/ClientIP='XX.XX.XX.XX'/PropertyCollector/TriggeredFiltersCount/total 0vpxd-profiler-1406.log:--> /SessionStats/SessionPool/Session/Id='52ce18fc-7357-XXXX-2e30-beXXXXXX8cf3'/Username='VSPHERE.LOCAL\<User_name>'/ClientIP='XX.XX.XX.XX'/PropertyCollector/NullCollectorCount/total 0vpxd-profiler-1406.log:--> /SessionStats/SessionPool/Session/Id='52ce18fc-7357-XXXX-2e30-beXXXXXX8cf3'/Username='VSPHERE.LOCAL\<User_name>'/ClientIP='XX.XX.XX.XX'/PropertyCollector/TotalObjectCount/total 0vpxd-profiler-1406.log:--> /SessionStats/SessionPool/Session/Id='52ce18fc-7357-XXXX-2e30-beXXXXXX8cf3'/Username='VSPHERE.LOCAL\<User_name>'/ClientIP='XX.XX.XX.XX'/PropertyCollector/QueuedOpsCount/total 0vpxd-profiler-1406.log:--> /SessionStats/SessionPool/Session/Id='52ce18fc-7357-XXXX-2e30-beXXXXXX8cf3'/Username='VSPHERE.LOCAL\<User_name>'/ClientIP='XX.XX.XX.XX'/PropertyCollector/TriggeredProcessGUReqs/total 0
52ce18fc-7357-XXXX-2e30-beXXXXXX8cf3
52ce18fc-7357-XXXX-2e30-beXXXXXX8c/ && /PropertyCollector/ && /ClientIP/' vpxd-profiler-1406.log | head -157-XXXX-2e30-beXXXXXX8c'/Username='VSPHERE.LOCAL\username'/ClientIP='10.x.x.x'/PropertyCollector/ReadLocked/total 0vCenter 7.x/8.x
This issue occurs when a monitoring or automation tool sends numerous performance-related SQL queries to vPostgres, querying large amounts of historical data.
To resolve the issue, perform the following:
Identify and Stop the Query Source
From logs, determine the client IP or user initiating the excessive queries.
Disable or Limit the Data Collection
If the source is an external monitoring tool, either stop it temporarily or reconfigure it to reduce query load.
Avoid full table scans or unfiltered historical queries.
Apply Constraints or Filtering
Where possible, configure the tool to use shorter time ranges or fetch only required metrics.
Introduce query-level filters to avoid wide joins on large metric tables.
If the initiated connections are coming from a third-party tool, refer to the tool's vendor for best practices/tuning parameters
Refer to the vendor on how to tune to reduce number of queries to the PerformanceManager.
Another option is to deny collection of certain metrics from the tool towards the vCenter. Consult with the vendor's documenatiton for reference.
less postgresql.log| grep "Process holding the lock" | awk -F 'Wait' '{print $1}' | awk -F ':' '{print $5}'| awk -F '.' '{print $1}'| uniq -c less postgresql.log| grep "Process holding the lock" | awk -F 'Wait' '{print $1}' | awk -F ':' '{print $5}'| awk -F '.' '{print $1}'| uniq > pidscausinglocks.logVCDB# SELECT count(*) cnt, 'Stat Level 1' FROM vpx_hist_stat1VCDB# UNION ALLVCDB# SELECT count(*) cnt, 'Stat Level 2' FROM vpx_hist_stat2VCDB# UNION ALLVCDB# SELECT count(*) cnt, 'Stat Level 3' FROM vpx_hist_stat3VCDB# UNION ALLVCDB# SELECT count(*) cnt, 'Stat Level 4' FROM vpx_hist_stat4;