Error code 101 when accessing NSX UI, or upgrade pre-check fails with error "Data migration timedout"
search cancel

Error code 101 when accessing NSX UI, or upgrade pre-check fails with error "Data migration timedout"

book

Article ID: 322456

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Error code 101 may be observed when accessing NSX-T UI, caused by high number of entries in GenericPolicyRealizedResource (GPRR) table.
  • Dry-run run as a part of pre-upgrade checks may fail with error:
    EDGE Upgrade Evaluation Tool execution check Prompts user to execute Upgrade Evaluation Tool before proceeding with upgrade. FAILURE "NSX Manager upgrade dry run failed. Do not proceed with the upgrade. Please collect the support bundle and contact VMWare GS. Failed migrations: Data migration timedout
    MP Upgrade Evaluation Tool execution check Prompts user to execute Upgrade Evaluation Tool before proceeding with upgrade. FAILURE "NSX Manager upgrade dry run failed. Do not proceed with the upgrade. Please collect the support bundle and contact VMWare GS. Failed migrations: Data migration timedout

  • NSX Manager logs indicate extremely large tables (high number of entries creating large table(s)) in Corfu DB.
  • In /var/log/corfu/corfu-compactor-audit.log, you may observe:
    1. extremely high number of entries in table(s), referred to as "entries"
    2. extremely large table, referred to as "cpSize"
      e.g.
      <nsx-manager>:~# grep "appendCheckpoint" /var/log/corfu/corfu-compactor-audit.log | grep "entries(" | awk '{print $10, $11}' | sort | uniq | sort -rnt'(' -k2
      ab######-d##d-3##0-a##f-8e######da1a, entries(12653305),            >> table: TagBulkOperation
      da######-9##6-3##4-9##8-d2######32e9, entries(1276041),             >> table: GenericPolicyRealizedResource
      8d######-3##b-3##8-8##8-b5######f302, entries(1241),
      9c######-9##c-3##3-a##9-cf######4151, entries(122),
      23######-b##7-3##d-a##e-0d######6fe4, entries(120),
      bb######-3##c-3##9-b##a-eb######ecc9, entries(12),
      (end of snip)
  • You can also use the below to show both "entries" and "cpSize":
    grep "appendCheckpoint" /var/log/corfu/corfu-compactor-audit.log | grep "entries(" | awk '{print $10, $11, $12}' | sort | uniq | sort -rnt'(' -k2
  • Impacted table in Corfu DB will indicate high number of "stringId" entries:
    1. Dump the table into a file:
      /opt/vmware/bin/corfu_tool_runner.py -n nsx -o showTable -t GenericPolicyRealizedResource > /tmp/gprr.txt
    2. Read the table:
      grep stringId /tmp/gprr.txt | awk '{print $2}' | cut -d "/" -f 1-6 | sort | uniq -c | sort -nr
      1265335 "/infra/realized-state/enforcement-points/default/tags
         3813 "/infra/realized-state/enforcement-points/default/services
         3119 "/infra/realized-state/enforcement-points/default/groups
         1929 "/infra/realized-state/enforcement-points/default/logical-ports
         287 "/infra/realized-state/enforcement-points/default/lb-rules
      (end of snip)
  • In case of failed dry-run, Manager's /var/log/corfu/corfu-compactor-audit.log may contain logs similar to the sample below:
    2024-06-11T11:21:57.994Z INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for da######-9##6-3##4-9##8-d2######2e9, entries(1207967), cpSize(744950900) bytes at snapshot Token(epoch=518, sequence=3746995903) in 3816415 ms
    ..
    2024-06-11T11:55:12.002Z ERROR main Migration 3453 - [nsx@6876 comp="nsx-manager" errorCode="MP217" level="ERROR" subcomp="manager"] Migration failed
    java.lang.RuntimeException: java.util.concurrent.TimeoutException
            at com.vmware.nsx.management.migration.ufo.UFOMigration.migrate(UFOMigration.java:200) ~[logical-migration.jar:?]
            at com.vmware.nsx.management.migration.impl.LogicalMigration.executeMigrations(LogicalMigration.java:78) ~[logical-migration.jar:?]
            at com.vmware.nsx.management.migration.impl.Migration.migrate(Migration.java:46) ~[logical-migration.jar:?]
            at com.vmware.nsx.management.migration.impl.LogicalMigration.main(LogicalMigration.java:47) ~[logical-migration.jar:?]
    Caused by: java.util.concurrent.TimeoutException
            at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[?:1.8.0_332]
            at com.vmware.nsx.management.migration.ufo.UFOMigration.migrate(UFOMigration.java:194) ~[logical-migration.jar:?]
            ... 3 more.

Environment

VMware NSX-T Data Center

Cause

This is caused by extremely high number of objects in NSX-T deployment (e.g. segment ports, tags), usually created by user / automation tool, which will exhaust Corfu, causing compactor to fail.

Resolution

Currently, there is no resolution to this issue.

Workaround:
Should you experience this problem, please contact VMware Global Support team for assistance, with a reference to this KB article.