HCX Manager /common partition full due to Postgres JOB table bloat

search cancel

HCX Manager /common partition full due to Postgres JOB table bloat

book

Article ID: 429452

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

In VMware HCX environments, the /common partition on the HCX Manager appliance may reach 99% or 100% utilization. This disk exhaustion prevents standard database maintenance tasks, such as VACUUM FULL, from executing due to insufficient working space.

Symptoms:

HCX Manager UI becomes unresponsive or extremely slow.
HCX Managers tasks fail or get stuck at 0%.
SSH command df -h shows /common at or near 100% capacity.

Postgres VACUUM FULL on the JOB table fails with errors indicating no space left on device.
/common/logs/postgres/postgresql-<date>_000000.log

<timestamps> GMT [2453] STATEMENT:  VACUUM FULL "Job";
<timestamps> GMT [2453] ERROR:  could not extend file "base/16384/1000###81": No space left on device
<timestamps> GMT [2453] HINT:  Check free disk space.

Environment

Product: VMware HCX
Versions: 4.10.x & 4.11.x

Cause

A regression in HCX 4.11.0 and 4.11.1 results in missing RPMs necessary for the Postgres vacuum functionality to operate correctly. This leads to excessive growth (bloat) of the JOB table within the Postgres database located on the /common partition.

Resolution

To resolve this issue, the /common partition must be temporarily extended to provide enough overhead for the VACUUM FULL operation to complete.

Extend the /common Partition:
- Follow the procedure provided here:
  - KB 373238: Provides instructions on extending partitions for HCX Manager.

Perform Database Maintenance:

Stop all HCX services except for the Postgres database:

Stop HCX services as below for 4.11.2 and earlier:

systemctl stop zookeeper
systemctl stop kafka
systemctl stop app-engine
systemctl stop web-engine
systemctl stop appliance-management

Log in to the HCX Postgres database. SSH to HCX using admin :
```
psql -U postgres hybridity
```
Execute the vacuum command on the JOB table:
```
VACUUM FULL "JOB";
```

Verify and Restart:
- Once the vacuum is successful, verify disk space: df -h /common.
- Restart all HCX services in order.
```
systemctl start zookeeper
systemctl start kafka
systemctl start app-engine
systemctl start web-engine
systemctl start appliance-management
```
- Verify UI responsiveness and manager health.

Additional Information

KB 408671: Details the missing RPM regression in 4.11.x.
KB 373238: Provides instructions on extending partitions for HCX Manager.
KB 321586: General maintenance for HCX Postgres databases.

Feedback

thumb_up Yes

thumb_down No