In VMware HCX environments, the /common partition on the HCX Manager appliance may reach 99% or 100% utilization. This disk exhaustion prevents standard database maintenance tasks, such as VACUUM FULL, from executing due to insufficient working space.
Symptoms:
HCX Manager UI becomes unresponsive or extremely slow.
SSH command df -h shows /common at or near 100% capacity.
Postgres VACUUM FULL on the JOB table fails with errors indicating no space left on device./common/logs/postgres/postgresql-<date>_000000.log
<timestamps> GMT [2453] STATEMENT: VACUUM FULL "Job";
<timestamps> GMT [2453] ERROR: could not extend file "base/16384/1000###81": No space left on device
<timestamps> GMT [2453] HINT: Check free disk space.
Product: VMware HCX
Versions: 4.10.x & 4.11.x
A regression in HCX 4.11.0 and 4.11.1 results in missing RPMs necessary for the Postgres vacuum functionality to operate correctly. This leads to excessive growth (bloat) of the JOB table within the Postgres database located on the /common partition.
To resolve this issue, the /common partition must be temporarily extended to provide enough overhead for the VACUUM FULL operation to complete.
Extend the /common Partition:
Follow the procedure provided here:
KB 373238: Provides instructions on extending partitions for HCX Manager.
Perform Database Maintenance:
Stop all HCX services except for the Postgres database:
Stop HCX services as below for 4.11.2 and earlier:
systemctl stop zookeeper
systemctl stop kafka
systemctl stop app-engine
systemctl stop web-engine
systemctl stop appliance-management
Log in to the HCX Postgres database. SSH to HCX using admin :
psql -U postgres hybridity
Execute the vacuum command on the JOB table:
VACUUM FULL "JOB";
Verify and Restart:
Once the vacuum is successful, verify disk space: df -h /common.
Restart all HCX services in order.
systemctl start zookeeper
systemctl start kafka
systemctl start app-engine
systemctl start web-engine
systemctl start appliance-management
Verify UI responsiveness and manager health.