SDDC manager upgrade failed after the REBOOT STAGE
search cancel

SDDC manager upgrade failed after the REBOOT STAGE

book

Article ID: 426677

calendar_today

Updated On:

Products

VMware SDDC Manager

Issue/Introduction

  • All SDDC Manager services fail to initialize following a system reboot as part of patch update workflow. 
  • Attempts to manually start the services using the /opt/vmware/vcf/operationsmanager/scripts/cli/sddcmanager_restart_services.sh script results in service start failures. 
  • postgres service is in "stopped" state when checked via systemctl status postgres
  • Manual attempts to start postgres service with systemctl start postgres with the following error. 

"psql: error: connection to server at "localhost" (127.0.0.1), port 5432 failed: fe_sendauth: error sending password authentication"

  • The journalctl -xe logs will report, expired password for the user postgres as seen below. 


MM DD HH:MM:SS sddc-manager.r1.rainpole.local sh[1378]: + su -s /bin/bash postgres -c 'sed -i -e '\''/^host/s/trust$/md5/'\'' /data/pgdata13/pg_hba.conf'
MM DD HH:MM:SS sddc-manager.r1.rainpole.local su[2067]: pam_unix(su:account): expired password for user postgres (password aged)
MM DD HH:MM:SS sddc-manager.r1.rainpole.local su[2067]: Successful su for postgres by root
MM DD HH:MM:SS sddc-manager.r1.rainpole.local sh[1378]: You are required to change your password immediately (password expired)
MM DD HH:MM:SS sddc-manager.r1.rainpole.local sh[1378]: su: Authentication token is no longer valid; new one required
MM DD HH:MM:SS sddc-manager.r1.rainpole.local su[2067]: pam_unix(su:session): session opened for user postgres by (uid=0)
MM DD HH:MM:SS sddc-manager.r1.rainpole.local sh[1378]: (Ignored)
MM DD HH:MM:SS sddc-manager.r1.rainpole.local su[2067]: + ??? root:postgres

  • The data partition on the sddc manager will report 100% utilization 

Environment

VCF 5.x 

Cause

The upgrade failed because the SDDC Manager's /data partition was nearly full or exceeded 25 Gb of consumption prior to the upgrade . This was driven by a known issue causing frequent disk consumption by the sessions tables in sddc_manager_ui database  SDDC Manager /data partition filling up to 100% frequently, which left insufficient buffer for the upgrade process to move necessary files. Consequently, the partition reached 100% capacity during the upgrade operation leading to postgres service start failure. 

Resolution

To resolve the issue