CPU and Memory utilization shows 100 % on the VCSA appliance.
A reboot of the vCenter clears the issue temporarily but the CPU Exhaustion alarm triggers back on the vCenter.
From Postgres logs, we see multiple issues related to :
vpx_hist_stat1_* tables delayed exclusive lock AccessShareLock : postgresql.log : xxxx-xx-xx xx:xx:xx.xxx xxx 68exxxxxxxxxxx 0 VCDB vpxd [local] 762641 4LOG: process 762641 still waiting for AccessShareLock on relation 18xxx of database 164xx after 1000.xxx msxxxx-xx-xx xx:xx:xx.xxx xxx 68exxxxxxxxxxx 1058xxxxx VCDB vpxd [local] 840xxx 7LOG: process 840xxx acquired AccessExclusiveLock on relation 18xxx of database 164xx after 4398xxx.xxx msxxxx-xx-xx xx:xx:xx.xxx xxx 68exxxxxxxxxxx 1058xxxxxx VCDB vpxd [local] 840xxx 8STATEMENT: truncate table vpx_hist_stat1_192xxxx-xx-xx xx:xx:xx.xxx xxx 68exxxxxxxxxxx 0 VCDB vpxd [local] 964xxx STATEMENT: BEGIN;declare "SQL_CUR0x7fe500xxxxxx" cursor with hold for SELECT sc.stat_id, d.device_name FROM vpx_stat_counter sc, vpx_sample_time1 sm, vpx_device d, vpx_hist_stat1 st WHERE sc.entity_id = 1 and coalesce(sc.device_id,1) = coalesce(d.device_id,1) and sc.counter_id = st.counter_id and st.time_id = sm.time_id AND sm.sample_time > '1970-xx-xx 00:00:00'::timestamp AND sm.sample_time <= 'xxxx-xx-xx xx:xx:xx.785'::timestamp GROUP BY sc.stat_id, d.device_name ORDER BY sc.stat_id, d.device_name ;fetch 1024 in "SQL_CUR0x7fe500xxxxx"
xxxx-xx-xx xx:xx:xx.xxx xxx 68xxxxxxxxxxxx 0 VCDB vpxd [local] 762xxx 4LOG: process 76xxxx still waiting for AccessShareLock on relation 18xxx of database 16xxx after 10xx.xxx msxxxx-xx-xx xx:xx:xx.xxx xxx 68xxxxxxxxxxxx 1058xxxxx VCDB vpxd [local] 840xxx 7LOG: process 84xxxx acquired AccessExclusiveLock on relation 18xxx of database 16xxx after 439xxxx.xxx msxxxx-xx-xx xx:xx:xx.xxx xxx 68xxxxxxxxxxxx 1058xxxxx VCDB vpxd [local] 840xxx 8STATEMENT: truncate table vpx_hist_stat1_19xxxxx-xx-xx xx:xx:xx.xxx xxx 68xxxxxxxxxxxx 0 VCDB vpxd [local] 964138 9STATEMENT: BEGIN;declare "SQL_CUR0x7fe500xxxxxx" cursor with hold for SELECT sc.stat_id, d.device_name FROM vpx_stat_counter sc, vpx_sample_time1 sm, vpx_device d, vpx_hist_stat1 st WHERE sc.entity_id = 1 and coalesce(sc.device_id,1) = coalesce(d.device_id,1) and sc.counter_id = st.counter_id and st.time_id = sm.time_id AND sm.sample_time > '19xx-xx-xx 00:00:00'::timestamp AND sm.sample_time <= 'xxxx-xx-xx xx:xx:xx.xxx'::timestamp GROUP BY sc.stat_id, d.device_name ORDER BY sc.stat_id, d.device_name ;fetch 10xx in "SQL_CUR0x7fe500xxxxxx"
Can view the third party username and the IP in vpxd/vpxd-profiler.log :
--> /SessionStats/SessionPool/Session/Id='52b56f78-xxxx-xxxx-xxxx-xxxxxxxxxxxx'/Username='xxxxxxx\xxxxxxxxxxxxxx'/ClientIP='xx.xx.xx.xx'/PropertyCollector/TriggeredFiltersCount/total 0
VMware vCenter server 8.x
The issue occurs when the OpsRamp tool is calling queryAvailableMetric without specifying a BEGIN_TIME for the query, then the query will look at every partition in the DB.
It is best to use queryAvailableMetric with a proper start time to reduce the number of partitions queries that would be modified by the vendor.
Workaround :
1. Stop VC services except (vpostgres)2. TRUNCATE all the Stats tables (vpx_hist_stat1, vpx_hist_stat2, vpx_hist_stat3, and vpx_hist_stat4).3. run vacuum verbose analyze. Command : /opt/vmware/vpostgres/current/bin/vacuumdb -U postgres -d VCDB --full –-analyze4. Restart vCenter or restart all services.
Contact the OpsRamp tool vendor to get the issue resolved.