NSX-T MP Upgrade failed with large CRL entries.
search cancel

NSX-T MP Upgrade failed with large CRL entries.

book

Article ID: 330437

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • MP upgrade stuck after failing on step 7 (run_migration_tool) .

    NSX-T> get upgrade progress-status
    ****************************************************************************
    Node Upgrade has been started. Please do not make any changes, until
    the upgrade operation is complete. Run "get upgrade progress-status"
    to show the progress of last upgrade step.
    ****************************************************************************
    Tue Aug 12 2022 UTC 22:37:46.753
    Upgrade info:
    From-version: 3.1.2.0.0.17883600
    To-version: 3.2.1.1.0.20115694
    
    Upgrade steps:
    download_os [2022-08-12 17:13:42 - 2022-08-12 17:14:45] SUCCESS
    shutdown_manager [2022-08-12 17:14:52 - 2022-08-12 17:17:02] SUCCESS
    install_os [2022-08-12 17:17:02 - 2022-08-12 17:18:30] SUCCESS
    migrate_manager_config [2022-08-12 17:18:30 - 2022-08-12 17:18:35] SUCCESS
    switch_os [2022-08-12 17:18:35 - 2022-08-12 17:18:44] SUCCESS
    reboot [2022-08-12 17:18:44 - 2022-08-12 17:19:17] SUCCESS
    run_migration_tool [2022-08-12 17:21:00 - 2022-08-12 17:22:03] FAILED
    ------ Output of last step start ------
    Status:
    2022-08-12 17:21:01.610404 Deleting datastore files
    2022-08-12 17:21:01.681056 Copying old datastore files
    2022-08-12 17:21:14.754974 Done copying old datastore files
    2022-08-12 17:21:17.779722 Start Corfu server
    2022-08-12 17:21:22.601038 Process corfu-server started
    2022-08-12 17:21:59.675320 Error running data migration tool. return value 1, lo g file /var/log/proton/data-migration.log
    
    Overall Progress: (1/6)
    ---- (1) CCP: Completed [1 object(s)] (2022-08-12 17:21:22 - 2022-08-12 17:21:28 ) ----
    -------------------------------------------------------------------------------- ------------
    ---- (2) Proton: 18% [617 of 3360 object(s)] (2022-08-12 05:21:49 - ) ----
    Currently Migrating: TruststoreCrlMigrationTask 0% [0 of 2 objects] (2022-08-12 05:21:57 - )
    -------------------------------------------------------------------------------- ------------
    ---- (3) Policy: Pending
    -------------------------------------------------------------------------------- ------------
    ---- (4) Logical: Pending
    -------------------------------------------------------------------------------- ------------
    ---- (5) CBM: Pending
    -------------------------------------------------------------------------------- ------------
    ---- (6) UFO Checkpointing: Pending
    -------------------------------------------------------------------------------- ------------
  • In the /var/log/proton/logical-migration.log file, you see entries similar to:

    2022-08-12T17:21:59.606Z  WARN main UfoCorfuTableMigrator 3758 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="manager"] ERROR while running custom migration MappingDetails{modelName='com.vmware.nsx.management.model.truststore.CrlEntity', migrationType=CUSTOM_MIGRATION, reason='null', customMigratorClassName='com.vmware.nsx.management.migration.task.truststore.TruststoreCrlMigrationTask', fieldMappings=null, targetProtoName='null', requiresCustomCode='false', owner='null', apiToTest='null'}
    org.corfudb.runtime.exceptions.TransactionAbortedException: TX ABORT  | Snapshot Time = Token(epoch=414, sequence=1397430604) | Failed Transaction ID = a41dcdbb-####-####-####-e3b873df9ec2 | Offending Address = -1 | Conflict Key = 00 | Conflict Stream = 00000000-0000-0000-0000-000000000000 | Cause = SIZE_EXCEEDED | Time = 2190 ms | Message = Trying to write 39286957 bytes but max write limit is 26214400 bytes
            at org.corfudb.runtime.view.ObjectsView.TXEnd(ObjectsView.java:188) ~[common-data-migration.jar:?]
            at org.corfudb.runtime.collections.TxnContext.commit(TxnContext.java:765) ~[common-data-migration.jar:?]
            at com.vmware.nsx.persistence.UfoTxn.commit(UfoTxn.java:937) ~[common-data-migration.jar:?]
            at com.vmware.nsx.management.migration.task.truststore.CommonMigrationTask.migrateType(CommonMigrationTask.java:67) ~[data-migration-hl.jar:?]
            at com.vmware.nsx.management.migration.task.truststore.TruststoreCrlMigrationTask.migrate(TruststoreCrlMigrationTask.java:28) ~[data-migration-hl.jar:?]
            at com.vmware.nsx.management.migration.ufo.UfoCorfuTableMigrator.customMigration(UfoCorfuTableMigrator.java:204) [common-data-migration.jar:?]
            at com.vmware.nsx.management.migration.ufo.UfoCorfuTableMigrator.migrate(UfoCorfuTableMigrator.java:166) [common-data-migration.jar:?]
            at com.vmware.nsx.management.migration.ufo.UFOMigration.migrate(UFOMigration.java:306) [common-data-migration.jar:?]
            at com.vmware.nsx.management.migration.ufo.UFOMigration.migrate(UFOMigration.java:196) [common-data-migration.jar:?]
            at com.vmware.nsx.management.migration.impl.ProtonMigration.executeMigrations(ProtonMigration.java:46) [data-migration-hl.jar:?]
            at com.vmware.nsx.management.migration.impl.Migration.migrate(Migration.java:46) [common-data-migration.jar:?]
            at com.vmware.nsx.management.migration.impl.ProtonMigration.main(ProtonMigration.java:29) [data-migration-hl.jar:?]
    Caused by: org.corfudb.runtime.exceptions.WriteSizeException: Trying to write 39286957 bytes but max write limit is 26214400 bytes
  •  You may encounter the same while running Evaluation Tool --> KB -NSX Upgrade Evaluation Tool (87379).

    nsx-upgrade> start dry-run data-migration mp-ip 10.10.#.#
    Root password of the Remote MP node:
    (1/11) Checking ssh connectivity to the MP node 10.10.#.# with root user...
    (2/11) Creating a temporary folder on MP 10.10.#.#
    (3/11) Copy Corfu data to the temporary folder on MP 10.10.#.#
    (4/11) Copy nsx_issue file to the temporary folder on MP 10.10.#.#
    (5/11) Create tar of the temporary folder on MP 10.10.#.#
    (6/11) Delete the temporary folder on MP 10.10.#.#
    (7/11) Fetching tar containing Corfu data
    (8/11) Delete tar file on MP
    (9/11) Downloaded corfu tgz file of size 35 MB
    (10/11) Loading the fetched Corfu data
    (11/11) Starting data-migration dry-run
    Running.... Please track progress in /var/log/cloudnet/data-migration.log, /var/log/proton/data-migration.log, /var/log/policy/data-migration.log, /var/log/proton/logical-migration.log
    *** WARNING: Some pre-upgrade check(s) failed. Do not proceed with the upgrade. Please collect the support bundle and contact VMWare GSS***TruststoreCrlMigrationTask****

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX-T Data Center

Cause

Upgrade failed due to large CRL entries in the system.This could happen when a deployment goes through multiple updates and the certificate signing authority published large CRLs. (Older versions of NSX-T import CRLs automatically when certificates refer to them even if they are not used.)

Resolution

Solved on NSX-T 3.2.1 as we do not longer auto import CRLs.
Therefore this is not an issue post 3.2.1.

Workaround:
Note: Before doing Workaround you need to perform a rollback in case you are stuck in the upgrade.

Then we have to delete CRLs using the trust-management API.
  • To get a list of CRLs in the system, runs this API : 
 GET https://<NSX_FQDN>/api/v1/trust-management/crls
  • Check if you get the same CRLs shown in the UI  from  System>Certificates>CRLs If so, you can delete them by calling this API with their IDs:
 DELETE https://<NSX_FQDN>/api/v1/trust-management/crls/<crl-id>
 


Additional Information