vCenter Server upgrade to 8.0 failing during firstboot with error "Database in-place upgrade failed"
search cancel

vCenter Server upgrade to 8.0 failing during firstboot with error "Database in-place upgrade failed"

book

Article ID: 371569

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • When attempting to upgrade a vCenter Server to 8.0, the upgrade fails during Stage 2 with error:
    Database in-place upgrade failed. Please see vcdb_inplace.err and vcdb_inplace.out for details.
    VMware VirtualCenter failed first boot.
  • In vpxd_firstboot.py_n_stdout.log file or in vcdb_inplace.err file, the following error(s) are seen:
    Failed to run inplace database upgrade - Error while executing ./Upgrade-v2017-to-v2018/postgresql/upgrade_PostgreSQL.sql, reason: Statement failure(rc=-1).
    ALTER TABLE IF EXISTS VPX_HOST_CRYPTO_KEYS DROP CONSTRAINT IF EXISTS PK_VPX_HOST_CRYPTO_KEYS CASCADE;
    
    1 [23505](1) ERROR: could not create unique index "short_crypto_tag_key"
    DETAIL: Key (crypto_tag)=(hms) is duplicated.;
    Error while executing the query
    Error while executing ./Upgrade-v2017-to-v2018/postgresql/VCDB_views_PostgreSQL.sql:1345, reason: Statement failure(rc=-1).
    
    1 [42703](1) ERROR: column "vmknic_name" does not exist;
    Error while executing the query
    ERROR: Statement failure(rc=-1).

Environment

VMware vCenter Server 7.0

VMware vCenter Server 8.0

Cause

This issue is caused by duplicate data existing in the table.  When the upgrade attempts to alter the table to create a unique primary key, it fails as the database expects only one unique row to exist.

Resolution

NOTE:  Before proceeding with the resolution steps, ensure you have a backup or offline snapshot of the vCenter Server Appliance. If the affected vCenter Server is a member of an Enhanced Linked Mode replication group, please make sure to have offline snapshots (in powered off state) of all ELM member nodes.

 

Identifying The Issue:

  • Open an SSH connection to the vCenter Server Appliance and login with the root user
  • Connect to the vPostgres database:
    # /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres
  • In order to determine if this KB is the correct resolution, you will need to run the following query and verify if there are duplicate records in the source vCenter Server:
    select crypto_tag, count(crypto_tag) from vpx_crypto_tag group by crypto_tag having count(crypto_tag) > 1 order by crypto_tag;
  • This query should return 0 rows if there is no issue.  If this KB is a match it will return one or more rows and may look like the following example:
     crypto_tag | count
    ------------+-------
     hms        |     2
    (1 row)
  • If the query returns 1 or more rows, you will need to identify the duplicates and remove them.  You will need to run three queries to identify the problematic rows and hosts:
    select crypto_tag, crypto_tag_id  from vpx_crypto_tag where crypto_tag='hms';
    
    select vct.crypto_tag, vct.crypto_tag_id,  vhctk.key_id from vpx_crypto_tag vct left join vpx_host_crypto_tag_keys vhctk on vhctk.crypto_tag_id=vct.crypto_tag_id where vct.crypto_tag='hms';
    
    select vct.crypto_tag, vct.crypto_tag_id,  vhctk.key_id, vhck.host_id, (select name from vpx_entity where id=vhck.host_id) as hostname from vpx_crypto_tag vct left join vpx_host_crypto_tag_keys vhctk on vhctk.crypto_tag_id=vct.crypto_tag_id left join vpx_host_crypto_keys vhck on vhck.key_id=vhctk.key_id where vct.crypto_tag='hms';
  • The output of the first query will produce results similar to this example:
    crypto_tag | crypto_tag_id
    -----------+---------------
    hms        |             1
    hms        |             2
    (2 rows)

The problem here is that we are trying to update the crypto_tag column to be a unique primary key.  This is not possible as there are two rows with the same value, so the database update fails.

  • The output of the second query will produce results similar to this example:
    crypto_tag | crypto_tag_id | key_id
    -----------+---------------+--------
    hms        |             1 |      2
    hms        |             1 |      6
    hms        |             1 |      9
    hms        |             1 |     12
    hms        |             1 |     13
    hms        |             1 |      1
    hms        |             1 |      8
    hms        |             1 |     11
    hms        |             1 |      5
    hms        |             1 |     10
    hms        |             1 |     14
    hms        |             1 |      7
    hms        |             1 |      3
    hms        |             2 |      4
    (14 rows)

The main problem here is that we have one row with values of "hms" and "2", which would not be possible after the update to create a unique identifier in the first table.

  • The final query will show which hosts each record is associated with similar to this example:
    crypto_tag | crypto_tag_id | key_id | host_id |     hostname
    -----------+---------------+--------+---------+--------------------
    hms        |             1 |      2 |     354 | esxi027.example.com
    hms        |             1 |      6 |     379 | esxi035.example.com
    hms        |             1 |      9 |     387 | esxi030.example.com
    hms        |             1 |     12 |     408 | esxi040.example.com
    hms        |             1 |     13 |     412 | esxi038.example.com
    hms        |             1 |      1 |     362 | esxi028.example.com
    hms        |             1 |      8 |     386 | esxi031.example.com
    hms        |             1 |     11 |     407 | esxi039.example.com
    hms        |             1 |      5 |     378 | esxi033.example.com
    hms        |             1 |     10 |     406 | esxi036.example.com
    hms        |             1 |     14 |     415 | esxi037.example.com
    hms        |             1 |      7 |     383 | esxi032.example.com
    hms        |             1 |      3 |     364 | esxi029.example.com
    hms        |             2 |      4 |     377 | esxi034.example.com
    (14 rows)

This query allows us to identify the problematic host, in this example esxi034.example.com.

You will need to analyze the results of the above queries for your specific environment in order to determine how to resolve this issue.

 

Resolving The Issue:

To resolve this issue, you must manually modify the VCDB. This is why it is critical to have taken a backup of the system before proceeding.

IMPORTANT:  The following resolution is specific to the above data output.  You will need to determine what updates to make to the specific environment you are working on as it may not be the same!  If you are unsure about what to do, please reach out to the Broadcom support team by opening a new support case.

  1. Stop the vCenter Server service:
    # service-control --stop vpxd
  2. Run the following command to update the vpx_host_crypto_tag_keys table.  This will correct references which would otherwise become invalid after the second query is run.  This example is based upon the example table data shown above.  For your environment, you will need to determine the correct values to enter into the query.  If you are unsure about this operation, please involve the support team.
    # /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres -c "update vpx_host_crypto_tag_keys set crypto_tag_id=1 where crypto_tag_id=2;"
  3. Run the following command to update the vpx_host_crypto_tag table.  This will remove the duplicate data.  This example is based upon the example table data shown above.  For your environment, you will need to determine the correct values to enter into the query.  If you are unsure about this operation, please involve the support team.
    # /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres -c "delete from vpx_crypto_tag where crypto_tag_id=2 and crypto_tag='hms';"
  4. Start the vCenter Server service:
    # service-control --start vpxd
  5. Attempt the upgrade again.