The vCenter Server Appliance (VCSA) may display a persistent "CPU Exhaustion" alarm at the top of the vSphere Client.
When checking the appliance using the top command, multiple Postgres processes (user=vpostgr) show excessive CPU consumption (often 90% or higher).
/var/log/vmware/vpxd/vpxd.log file, frequent "Invoke done" and "OnStreamClose" messages may be seen for a specific entity ID that does not exist in the current inventory.
Example log entry with ID 55555 representing an object no longer existing in vCenter:YYYY-MM-DDTHH:MM:SSZ verbose vpxd[######] ... vim.view.View.destroy, ... internal, 8.0.3.0, ... id: 55555, state(in/out): 3/1 ...
/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB SELECT name, type_id FROM vpx_entity WHERE id = <ID_FROM_LOGS>;In the database activity logs (retrieved via pg_stat_activity), multiple active cursors may be seen performing SELECT operations on vpx_hist_stat tables with a sample_time defaulting to 1970-01-01.
This issue typically occurs when an external monitoring tool or service retains a reference to a deleted vCenter entity (Managed Object like a Virtual Machine) and repeatedly polls for its performance data due to various factors, like:
This exhaustive scan consumes nearly all available CPU resources, impacting the database and secondary services like the Identity Manager which may also show processes at 90%+ CPU usage.
To resolve this issue, the external source of the queries must be identified and corrected, and the database congestion must be cleared with 3 steps:
Identify the Source IP: Review /var/log/vmware/envoy/envoy-access.log to identify the external IP address making frequent QueryPerf calls to the /sdk endpoint.
Update Monitoring Configuration: Engage the owner of the identified monitoring server to:
Refresh the tool's inventory discovery to purge deleted entity IDs (e.g., 55555) from its polling queue.
Ensure all QueryPerf API calls include a specific, recent BEGIN_TIME (e.g., last 20 minutes) to prevent global database scans.
Clear Database Congestion: To immediately recover CPU cycles, the statistics partitions may be cleared and optimized. See steps below.
Steps to Clear Database Congestion to temporarily recover from the CPU exhaustion. Perform the following on the vCenter appliance:
***Note: This action will wipe all historical statistics from vCenter. CPU Exhaustion should come down over time without this part if steps 1 and 2 are completed, but these steps will recover the VCSA CPU right away if the loss of historical statistics is acceptable.
service-control --stop --allservice-control --start vpostgres/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDBDO $$
DECLARE
r RECORD;
BEGIN
FOR r IN (SELECT tablename FROM pg_tables WHERE tablename LIKE 'vpx_hist_stat%') LOOP
EXECUTE 'TRUNCATE TABLE ' || quote_ident(r.tablename) || ' CASCADE';
END LOOP;
END $$;
***After running this, there should be a message "DO" (which we did see in this case) showing that the loop finished.
\q/opt/vmware/vpostgres/current/bin/vacuumdb -U postgres -d VCDB --full --analyze
service-control --start --allRefer to (KB 418224) Frequent CPU Exhaustion alarm on vCenter server for a similar issue specific to monitoring with the OpsRamp tool but not necessarily querying data related to an entity that no longer exists in vCenter.