VMware Aria Operations upgrade fails with "new_validate" error due to database corruption and sizing limits at 11 of 14 stage
search cancel

VMware Aria Operations upgrade fails with "new_validate" error due to database corruption and sizing limits at 11 of 14 stage

book

Article ID: 438387

calendar_today

Updated On:

Products

VMware Aria Operations (formerly vRealize Operations) 8.x

Issue/Introduction

  • When attempting to upgrade VMware Aria Operations to version 8.18.6, the upgrade process fails at the 11 of 14 stage (Failed Adapter installed failed)
  • The Admin UI displays the following error: Failed The PAK action "new_validate" script "source ./pak_python_wrapper.sh validate.py" failed
  • Log Evidence: The /storage/log/vcops/log/analytics-{uuid}.log on the Primary Node contains a Postgres exception: 
    YYYY-MM-DDTMM:SS:29,641+0000 ERROR [DistTaskDistributedTaskInstallUninstallAdapters] com.integrien.alive.controller.DistributedTaskInstallUninstallAdapters.describeInstalledAdapters - Describe failed: FunctionException: org.apache.geode.cache.execute.FunctionException: com.vmware.vcops.platform.gemfire.GemfireFunct
    ion$MethodInvocationException: ContentDescribeException: GlobalDataPersistenceException: Unable to perform batch action Msg: Update DB failed, errMsg=PreparedStatementCallback; uncategorized SQLException for SQL [INSERT INTO kv_symptomproblemdefinition (actionstatus, adapterkind, adaptersource, badge, col__
    kv_strvalue, col_kv_valuetype, colkv_version, key, resourcekind, status) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ON CONFLICT (key) DO UPDATE SET actionstatus=?, adapterkind=?, adaptersource=?, badge=?, colkv_strvalue=?, colkv_valuetype=?, col_kv_version=?, resourcekind=?, status=?]; SQL state [XX001]; error co
    de [0]; ERROR: invalid page in block ## of relation base/#####/#####; nested exception is org.postgresql.util.PSQLException: ERROR: invalid page in block ## of relation base/####/##### Error code: DB_EXCEPTION Kv Exception msg: Update DB failed, errMsg=PreparedStatementCallback; uncategorized SQLExcep
    tion for SQL [INSERT INTO kv_symptomproblemdefinition (actionstatus, adapterkind, adaptersource, badge, col_kv_strvalue, colkv_valuetype, colkv_version, key, resourcekind, status) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ON CONFLICT (key) DO UPDATE SET actionstatus=?, adapterkind=?, adaptersource=?, badge=?, col_
    kv_strvalue=?, col_kv_valuetype=?, col_kv_version=?, resourcekind=?, status=?]; SQL state [XX001]; error code [0]; ERROR: invalid page in block ## of relation base/#####/#####; nested exception is org.postgresql.util.PSQLException: ERROR: invalid page in block ## of relation base/#####/#####
    org.apache.geode.cache.execute.FunctionException: org.apache.geode.cache.execute.FunctionException: com.vmware.vcops.platform.gemfire.GemfireFunction$MethodInvocationException: ContentDescribeException: GlobalDataPersistenceException: Unable to perform batch action Msg: Update DB failed, errMsg=PreparedStatem
    
    adaptersource=?, badge=?, col_kv_strvalue=?, colkv_valuetype=?, col_kv_version=?, resourcekind=?, status=?]; SQL state [XX001]; error code [0]; ERROR: invalid page in block ## of relation base/#####/#####; nested exception is org.postgresql.util.PSQLException: ERROR: invalid page in block ## of relation base
    /#####/##### Error code: DB_EXCEPTION Kv Exception msg: Update DB failed, errMsg=PreparedStatementCallback; uncategorized SQLException for SQL [INSERT INTO kv_symptomproblemdefinition (actionstatus, adapterkind, adaptersource, badge, col_kv_strvalue, colkv_valuetype, col_kv_version, key, resourcekind,
    status) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ON CONFLICT (key) DO UPDATE SET actionstatus=?, adapterkind=?, adaptersource=?, badge=?, col_kv_strvalue=?, colkv_valuetype=?, col_kv_version=?, resourcekind=?, status=?]; SQL state [XX001]; error code [0]; ERROR: invalid page in block ## of relation base/#####/#####
     nested exception is org.postgresql.util.PSQLException: ERROR: invalid page in block ## of relation base/#####/#####
    

Environment

  • VMware Aria Operations 8.18.X

Cause

  • The upgrade failure is typically caused by index corruption within the internal Postgres database (vcopsdb), specifically affecting the kv_symptomproblemdefinition table.
  • This corruption is frequently triggered by extreme cluster over-sizing. For example, a large number of objects collected by third-party management packs, such as several hundred thousand FlashArray Volume Snapshot objects from the Pure Storage Adapter and can exceed the supported objects-per-node limit. This over-sizing prevents standard database maintenance tasks from completing, leading to index fragmentation and page corruption.

Resolution

  • To resolve the error observed in the logs, the environment-specific database corruption must be corrected. For required database modifications, please open a Support Request with Broadcom Technical Support (438387)
  • Creating and managing Broadcom support cases.