SDDC manager upgrade failed after the REBOOT STAGE

search cancel

SDDC manager upgrade failed after the REBOOT STAGE

book

Article ID: 426677

calendar_today

Updated On:

Products

VMware SDDC Manager / VCF Installer

Issue/Introduction

All SDDC Manager services fail to initialize following a system reboot as part of patch update workflow.
Attempts to manually start the services using the /opt/vmware/vcf/operationsmanager/scripts/cli/sddcmanager_restart_services.sh script results in service start failures.
postgres service is in "stopped" state when checked via systemctl status postgres
Manual attempts to start postgres service with systemctl start postgres with the following error.

"psql: error: connection to server at "localhost" (127.0.0.1), port 5432 failed: fe_sendauth: error sending password authentication"

The journalctl -xe logs will report, expired password for the user postgres as seen below.

MM DD HH:MM:SS sddc-manager.r1.rainpole.local sh[1378]: + su -s /bin/bash postgres -c 'sed -i -e '\''/^host/s/trust$/md5/'\'' /data/pgdata13/pg_hba.conf'
MM DD HH:MM:SS sddc-manager.r1.rainpole.local su[2067]: pam_unix(su:account): expired password for user postgres (password aged)
MM DD HH:MM:SS sddc-manager.r1.rainpole.local su[2067]: Successful su for postgres by root
MM DD HH:MM:SS sddc-manager.r1.rainpole.local sh[1378]: You are required to change your password immediately (password expired)
MM DD HH:MM:SS sddc-manager.r1.rainpole.local sh[1378]: su: Authentication token is no longer valid; new one required
MM DD HH:MM:SS sddc-manager.r1.rainpole.local su[2067]: pam_unix(su:session): session opened for user postgres by (uid=0)
MM DD HH:MM:SS sddc-manager.r1.rainpole.local sh[1378]: (Ignored)
MM DD HH:MM:SS sddc-manager.r1.rainpole.local su[2067]: + ??? root:postgres

The data partition on the sddc manager will report 100% utilization

Environment

VCF 5.x

Cause

The upgrade failed because the SDDC Manager's /data partition was nearly full or exceeded 25 Gb of consumption prior to the upgrade . This was driven by a known issue causing frequent disk consumption by the sessions tables in sddc_manager_ui database SDDC Manager /data partition filling up to 100% frequently, which left insufficient buffer for the upgrade process to move necessary files. Consequently, the partition reached 100% capacity during the upgrade operation leading to postgres service start failure.

Resolution

To resolve the issue

Revert the snapshot of the sddc manager to a point prior to the upgrade.
Contact Broadcom Support to fix the data partition consumption because of the known issue SDDC Manager /data partition filling up to 100% frequently

Feedback

thumb_up Yes

thumb_down No