vCenter Server high CPU and vpxd memory panic due to resource exhaustion and service registration corruption
search cancel

vCenter Server high CPU and vpxd memory panic due to resource exhaustion and service registration corruption

book

Article ID: 426063

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

vCenter Server Appliance (VCSA) experience 100% CPU utilization and the vmware-vpxd service fails to remain started.

Symptoms:

  • vpxd.log contains: 

    Panic: Memory exceeds hard limit. Panic.
  • vpxd.log shows database latency: 

    SQL execution took too long: insert into VPX_HIST_STAT1_<XXX>
  • vmware-sps (Storage Profile Service) fails to start or hangs during initialization.

  • ESXi hosts may show as "Not Responding" or "Disconnected" even if vCenter services are running.

  • High CPU cycles observed at the appliance level via top or vimtop.

Environment

6.7

Cause

The issue is caused by a combination of vCenter Server resource starvation and service identity corruption.

  1. Under-provisioning: The VCSA deployment size (e.g., Small or Normal) is insufficient to handle the inventory load, especially when management plane failures on ESXi hosts cause a backlog of processing threads.

  2. Database Fragmentation: Bloat in the VPX_HIST_STAT tables causes synchronous write delays, leading to memory exhaustion in the vpxd process.

  3. Lookup Service Inconsistency: Stale solution user registrations prevent the vmware-sps service from authenticating with the VMware Directory Service (vmdir).

Resolution

1. Vertical Scaling of VCSA

Increase the CPU and Memory resources of the vCenter Server Appliance to match the "Large" or "X-Large" deployment specifications as per the VMware Configuration Maximums.

  1. Shut down the VCSA.

  2. Edit Settings via the ESXi host/cluster management.

  3. Increase CPU and RAM.

  4. Power on the VCSA.

2. Database Maintenance

Reclaim space and optimize indices on the affected statistics tables.

  1. Stop the vpxd service: service-control --stop vmware-vpxd.

  2. Connect to the DB: /opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB.

  3. Execute: VACUUM (FULL, ANALYZE) VPX_HIST_STAT1_65;.

  4. Exit DB: \q.

3. Repair Service Registrations (SPS)

Use the lsdoctor tool to fix corrupted service identities.

  1. Download and transfer the lsdoctor tool to the VCSA.

  2. Run the tool to check for issues: python lsdoctor.py -l.

  3. Apply fixes for the solution users (in use by Storage Profile Service): python lsdoctor.py -u.

  4. Restart all services: service-control --stop --all && service-control --start --all.