NSX 4.1.2 upgrade prechecks fail with "NSX Manager upgrade dry run failed"
search cancel

NSX 4.1.2 upgrade prechecks fail with "NSX Manager upgrade dry run failed"

book

Article ID: 324241

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX Upgrade Pre-Checks fail with "NSX Manager upgrade dry run failed. Do not proceed with the upgrade."
  • NSX Manager logs may show an error similar to this example:
    • /var/log/syslog
      • 2023-09-21T05:14:46.957Z ERROR providerTaskExecutor-87 PolicyProviderUtil 4917 POLICY [nsx@6876 comp="nsx-manager" errorCode="PM0" level="ERROR" subcomp="manager"] Created alarm Alarm [policyPath=/infra/realized-state/enforcement-points/default/logical-ports/infra-########-####-####-####-############-default:########-####-####-####-############-lp/alarms/########-####-####-####-############, message=TX ABORT  | Snapshot Time = Token(epoch=2164, sequence=3391905417) | Failed Transaction ID = ########-####-####-####-############ | Offending Address = 3391905445 | Conflict Key = 3E0923C968CFCDDC | Conflict Stream = ########-####-####-####-############ | Cause = CONFLICT | Time = 194 ms,errorId=PROVIDER_INVOCATION_FAILURE, path=null, apiError=null, sourceSiteId=null].
    • /var/log/upgrade-coordinator/logical-migration.log
      • 2023-10-02T14:52:58.244Z ERROR ViewsGarbageCollector AbstractView 3162186 getLayoutUninterruptibly: Encountered error. Aborting layoutHelper
        java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
        
        2023-10-02T14:56:05.371Z ERROR main Migration 3162186 - [nsx@6876 comp="nsx-manager" errorCode="MP217" level="ERROR" subcomp="manager"] Migration failed
        java.lang.RuntimeException: java.util.concurrent.TimeoutException
        
    • /image contains a dump file 
      • migration_oom.hprof.gz

Environment

VMware NSX-T 4.x
NSX Upgrade to 4.1.2 from version 3.2.x, 4.0.x or 4.1.x.

Cause

This issue occurs when the NSX Manager database contains a large number of stale port entries which causes unexpectedly high transaction times.
Note upgrades from greenfield deployments of 4.1.1 are not impacted by this issue.

Resolution

To address this upgrade precheck issue, customers can use the NSX 4.2.2 (or later) upgrade precheck file (.pub file) from the Broadcom download page to successfully complete an NSX Upgrade Precheck. This action only resolves the precheck issue and does not perform the actual NSX upgrade.

If NSX 4.2.2 (or later) is not the target upgrade version, the customer's desired version precheck file should be uploaded after the precheck is complete in order to revert the upgrade to version to the desired version.

Using the NSX 4.2.2 (or later) precheck file as a temporary workaround will not impact the subsequent upgrade to the desired version, provided that the correct version's precheck file is uploaded before initiating the actual upgrade process.

 

If using above workaround did not resolve the issue, please download the attached logical-migration.jar file from this KB and perform the below steps:

  1. Make sure a fresh NSX backup is taken, please no snapshots. 
    • A cold clone of each NSX manager can also be used in addition to NSX backup.
  2. Copy the logical-migration.jar (available in the download section of this article) to "/opt/vmware/upgrade-coordinator-tomcat/temp/" onto one of the NSX Manager.
  3. Perform the following steps as root user:
    1. A dryrun version of the command for testing, an example table output will be shown:
      • java -Xmx200m -Dcorfu-property-file-path=/opt/vmware/upgrade-coordinator-tomcat/conf/ufo-factory.properties -Djava.io.tmpdir=/opt/vmware/upgrade-coordinator-tomcat/temp -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j.configurationFile=/opt/vmware/upgrade-coordinator-tomcat/conf/log4j2.xml -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/opt/vmware/upgrade-coordinator-tomcat/conf/logging.properties -Dnsx-service-type=nsx-manager -Dcom.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration.dryRun=true -Dcom.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration.batchSize=30  -Dcom.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration.minAgeThresholdMinutes=1 -Dcom.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration.persistentDataDirPath=/nonconfig/diskonlycorfutable/logical-migration -Dcom.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration.persistentDataModeEnabled=true -cp /opt/vmware/upgrade-coordinator-tomcat/temp/logical-migration.jar com.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration
    2. If above dryrun shows no error messages and an example table output is shown, run the following command to remove stale entries
      • java -Xmx1g -Dcorfu-property-file-path=/opt/vmware/upgrade-coordinator-tomcat/conf/ufo-factory.properties -Djava.io.tmpdir=/opt/vmware/upgrade-coordinator-tomcat/temp -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j.configurationFile=/opt/vmware/upgrade-coordinator-tomcat/conf/log4j2.xml -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/opt/vmware/upgrade-coordinator-tomcat/conf/logging.properties -Dnsx-service-type=nsx-manager -Dcom.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration.dryRun=false -Dcom.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration.batchSize=30 -Dcom.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration.maxEntriesToRectify=-1 -Dcom.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration.minAgeThresholdMinutes=1 -Dcom.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration.persistentDataDirPath=/nonconfig/diskonlycorfutable/logical-migration -Dcom.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration.persistentDataModeEnabled=true  -cp /opt/vmware/upgrade-coordinator-tomcat/temp/logical-migration.jar com.vmware.nsx.management.migration.impl.StaleSegmentPortGPRRMigration
        
    3. Run the following command to reset the ownership of any newly created upgrade-coordinator log files:
      • chown uuc:uuc /var/log/upgrade-coordinator/upgrade-coordinator*log*
If none of the above workaround resolves the issue, please open a Broadcom Support Request and reference this KB. 

Attachments

logical-migration.jar get_app