Unable to run Postgres VACUUM due to "connection to server on socket failed"
search cancel

Unable to run Postgres VACUUM due to "connection to server on socket failed"

book

Article ID: 435340

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

When attempting to resolve a 100% full root partition (/dev/mapper/system-system_0) in Aria Suite Lifecycle by following Broadcom KB 415752, administrators may encounter the following roadblocks that prevent clearing the database bloat:

  • When attempting to switch to the postgres user (su - postgres), the system throws a warning:
    • "Your account has expired; please contact your system administrator. su: Authentication token expired (Ignored)."
  • When attempting to connect to the database to run the VACUUM command (./psql -d vrlcm), it fails with the following error:
    • "psql.bin: error: connection to server on socket "<directory>" failed: No such file or directory"

Environment

VMware Aria Suite Lifecycle (LCM) 8.18.x

Cause

This issue is a "Catch-22" scenario caused by absolute disk exhaustion:

  • The Socket Error: The PostgreSQL service (vpostgres) crashes when the disk hits exactly 100% capacity because it requires a small amount of free space to create temporary lock files. Because the database service is down, administrators cannot connect to it to run the VACUUM command required to clear the space.

Resolution

To break the cycle, you must blindly clear temporary log and cache files as root to carve out enough space to start the PostgreSQL service. Then, you can run the vacuum command via sudo to bypass the expired account shell.

  1. Free up temporary space

    Log into the LCM appliance via SSH as root and run the following commands sequentially to clear package caches and system logs:

    tdnf clean all
    journalctl --vacuum-time=1d
    truncate -s 0 /var/log/messages

    Run df -h and verify that /dev/mapper/system-system_0 has dropped to 99% or shows at least a few megabytes of available space.

  2. Start the PostgreSQL Service

    Now that the disk has a minimal amount of free space, start the database service:

    systemctl restart vpostgres

    Verify the service is running (press q to exit the status screen):

    systemctl status vpostgres
  3. Run the VACUUM command (Bypassing su)

    Execute the database vacuum command directly from the root prompt. This bypasses the expired authentication token issue by utilizing sudo:

    sudo -u postgres /opt/vmware/vpostgres/current/bin/psql -d vrlcm -c "VACUUM FULL verbose analyze;"
    Note: This process may take several minutes to complete depending on the size of the database bloat.
  4. Verify Space and Restart LCM Services

    Once the vacuum completes, verify that a healthy amount of disk space has been reclaimed (the partition typically drops into the 70% range):

    df -h

    Finally, restart the main LCM server service to restore web UI access:

    systemctl restart vrlcm-server
    Note: It may take 3 to 5 minutes for the Aria Suite Lifecycle UI to fully initialize and become accessible in a web browser.

Additional Information

Regarding the issue when attempting to switch to the postgres user (su - postgres), the system throws a warning:

  • "Your account has expired; please contact your system administrator. su: Authentication token expired (Ignored)."

Please reference KB (KB# 435345) to resolve this issue.

Note: This KB is a supplement to KB 415752.