Upgrade failing on step 9 of 14 Troubleshooting Guidance

Products

VCF Operations/Automation (formerly VMware Aria Suite) VMware vRealize Operations 8.x

Issue/Introduction

There are several causes for an Aria Operations cluster upgrade to fail on step 9 of 14, as seen in the Software Update tab of the Admin UI, and this article is aimed at guiding a user facing an issue to the knowledge article pertinent to their issue.

Environment

Aria Operations 8.x

Cause

Networking issues related to gateways or DNS.
Often seen in "Preparing to apply product update" sub status

Database issues.
Often seen in [run master postgres db upgrade] or [run sql db upgrade] sub statuses

Past installation methods.
Seen in [run admin first boot scripts] sub status.

Resolution

Pertinent logs to review:

/storage/vcops/log/casa/casa.log
/storage/vcops/log/centralsqldbupgrade.log
/storage/db/vcops/vpostgres/data/serverlog /storage/log/vcops/logs/pakManager/vRealizeOperationsManagerEnterprise-############/apply_system_update_stderr.log
/var/log/firstboot/casa-unicorn-firstboot.py_6145_stdout.log

Pertinent Configuration files to review:

/opt/vmware/etc/ovfEnv.xml

List of known issues regarding this step failing:

KB Link	General Cause	Identifying Symptoms
Upgrading Aria Operations hangs on step 9 of 14 "Preparing to apply product update"	Incorrect Gateway IP address configured for the inaccessible node(s).	Some node(s) in the cluster are showing as inaccessible in the admin UI
Upgrading Aria Operations hangs on step 9 of 14 "Applied Operations System Update" and "Preparing to apply product update"	Either DNS was not configured, or it was corrupted or misconfigured. nslookup was not functioning during the upgrade. It may have been caused by the DNS servers not responding, which led to the upgrade being retried.	`/storage/vcops/log/casa/casa.log:` `java.net.UnknownHostException: *************-.*****.****..**: Temporary failure in name resolution`
Upgrade failing on step 9 of 14 [run master postgres db upgrade]	This error is due to missing data in the Postgres database.	`/storage/vcops/log/centralsqldbupgrade.log:` `ERROR [main] com.vmware.statsplatform.persistence.global.UserDataService.saveNotificationRules - Notification rule to plugin mapping relationship not found in DB.` `ERROR [main] .processJAVA - error:` `java.lang.RuntimeException: Notification rule to plugin mapping relationship not found in DB.`
Aria Operations upgrade fails at step 9 of 14 with "Failedresource key=pak_manager.action_failed, resource args=[run master postgres db upgrade]"	This is due to a null value in one of the reports in the kv_data_shard table.	`/storage/vcops/log/centralsqldbupgrade.log: 2024-10-15 10:45:14,100 INFO [main] com.vmware.vcops.dbupgrade.postgres.centraldb.upgrade.v818.FixingDataShardName.upgrade - Processing key = REPORT:com.vmware.statsplatform.persistence.content.report.Report.########-####-####-####-############.CSV 2024-10-15 10:45:14,101 ERROR [main] .processJAVA - error: java.lang.NullPointerException: null`
Upgrade failing on step 9 of 14 [run master postgres db upgrade] due to "index row size # exceeds btree version 4 maximum"	This error is due to legacy data that is too large to fit within the internal Postgres database schema.	`/storage/vcops/log/centralsqldbupgrade.log:` `Caused by: java.sql.BatchUpdateException:` `ERROR: index row size 2760 exceeds btree version 4 maximum 2704 for index "dynamic_attributes_pkey"`
"Aria Operations upgrade failure from 8.12.1 to 8.17.2 at step 9 with the error: "Failed resource key=pak_manager.action_failed resource args=[run sql db upgrade]"	The upgrade is failing due to data db upgrade script running a monolithic delete on large alarm tables, that results in a transaction that is too large for the default memory settings used during upgrade, and that are causing postgres transaction wraparound errors.	`sqldbupgrade.log: ERROR com.vmware.vcops.dbupgrade.postgres.sharded.UpgradeActionExecutorManager - Upgrade failed on action: UpgradeAction [dataChanged=false, cleanUpBadData=false, steps=[Step [action=818/remove_dt_events.sql, type=SQL]], version=181, transactional=true]` `java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: org.postgresql.util.PSQLException: ERROR:` `MultiXactId 1702194274 has not been created yet -- apparent wraparound`
VMware Aria Operations upgrade from 8.17 to 8.18.1 fails with the error “error: Failedresource key=pak_manager.action_failed, resource args=[run sql db upgrade]"	`/storage/log` directory does not have the required permissions on the primary node.	[run sql db upgrade] is shown at the end of the error status in the UI. `/storage/db/vcops/vpostgres/data/serverlog:` `FATAL: could not open log file "/storage/log/postgres/data/postgresql-25.log": Permission denied`
LEGACY: Upgrading VMware Aria Operations to 8.14 fails with error "resource key=pak_manager.action_failed, resource args=[run admin first boot scripts]"	LEGACY: This is a known issue related to vCenter plugin integration into Aria operations. It leads to product upgrade failure if the cluster has been deployed through the vCenter plugin window. Unicorn's 'casa-unicorn-firstboot.py' script shouldn't be executed during the product update. The vCenter plugin webapp installation and configuration must only be performed at fresh deployment.	`/storage/log/vcops/logs/pakManager/vRealizeOperationsManagerEnterprise-############/apply_system_update_stderr.log:` `ERROR [6145] - root - [Failed] /usr/lib/vmware-casa/firstboot/casa-unicorn-firstboot.py casa-unicorn-firstboot.py - Failed` `ERROR [6145] - root - Upgrade first boot flow is a failure` `/var/log/firstboot/casa-unicorn-firstboot.py_6145_stdout.log:` `Source file war: "/usr/lib/vmware-casa/vrops-casa-unicorn.war" does not exist` `/opt/vmware/etc/ovfEnv.xml:` `...solutionInstall.postDeployData.endpoint...https://<vcenter.example.com>/api/ui/solutioninstall...`

Additional Information

If you run into an issue that is not listed here and there is an existing KB for it, please use the feedback button to alert us that it needs to be added to this KB.

If you run into an issue that is not covered in any of the above scenarios or in a separate KB, please gather a full support bundle from Aria Operations (it will only "succeed partially") and open a support case with Broadcom (Creating and managing Broadcom support cases). Reference this article in the issue description and attach the log bundle to the case.