HCX Manager /common partition full due to Postgres JOB table bloat
search cancel

HCX Manager /common partition full due to Postgres JOB table bloat

book

Article ID: 429452

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

In VMware HCX environments, the /common partition on the HCX Manager appliance may reach 99% or 100% utilization. This disk exhaustion prevents standard database maintenance tasks, such as VACUUM FULL, from executing due to insufficient working space.

Symptoms:

  • HCX Manager UI becomes unresponsive or extremely slow.

  • HCX Managers tasks fail or get stuck at 0%.
  • SSH command df -h shows /common at or near 100% capacity.

  • Postgres VACUUM FULL on the JOB table fails with errors indicating no space left on device.
    /common/logs/postgres/postgresql-<date>_000000.log

    <timestamps> GMT [2453] STATEMENT:  VACUUM FULL "Job";
    <timestamps> GMT [2453] ERROR:  could not extend file "base/16384/1000###81": No space left on device
    <timestamps> GMT [2453] HINT:  Check free disk space.

Environment

  • Product: VMware HCX

  • Versions: 4.10.x & 4.11.x

Cause

A regression in HCX 4.11.0 and 4.11.1 results in missing RPMs necessary for the Postgres vacuum functionality to operate correctly. This leads to excessive growth (bloat) of the JOB table within the Postgres database located on the /common partition.

Resolution

To resolve this issue, the /common partition must be temporarily extended to provide enough overhead for the VACUUM FULL operation to complete.

  1. Extend the /common Partition:

    • Follow the procedure provided here: 

      • KB 373238: Provides instructions on extending partitions for HCX Manager.

  2. Perform Database Maintenance:

    • Stop all HCX services except for the Postgres database:

      Stop HCX services as below for 4.11.2 and earlier:
      
      systemctl stop zookeeper
      systemctl stop kafka
      systemctl stop app-engine
      systemctl stop web-engine
      systemctl stop appliance-management
    • Log in to the HCX Postgres database. SSH to HCX using admin : 

      psql -U postgres hybridity
    • Execute the vacuum command on the JOB table:

      VACUUM FULL "JOB";
      
  3. Verify and Restart:

    • Once the vacuum is successful, verify disk space: df -h /common.

    • Restart all HCX services in order.

      systemctl start zookeeper
      systemctl start kafka
      systemctl start app-engine
      systemctl start web-engine
      systemctl start appliance-management
    • Verify UI responsiveness and manager health.

Additional Information

 

  • KB 408671: Details the missing RPM regression in 4.11.x.

  • KB 373238: Provides instructions on extending partitions for HCX Manager.

  • KB 321586: General maintenance for HCX Postgres databases.