In VMware NSX 4.1.x, upgrading to NSX 4.2.x. An error message shows while running precheck "upstream connect error or disconnect/reset before headers. reset reason connection failure, transport failure reason: delayed connect error:111"
NSX manager logs under /var/log/upgrade-coordinator/upgrade-coordinator-tomcat-wrapper.log shows crashes:
INFO | jvm 1 | 2025/01/27 14:54:43 | java.lang.OutOfMemoryError: Java heap spaceSTATUS | wrapper | 2025/01/27 14:54:43 | The JVM has run out of memory. Requesting thread dump.STATUS | wrapper | 2025/01/27 14:54:43 | Dumping JVM state.STATUS | wrapper | 2025/01/27 14:54:43 | The JVM has run out of memory. Restarting JVM.INFO | jvm 1 | 2025/01/27 14:54:43 | Dumping heap to /image/core/uc_oom.hprof ...INFO | jvm 1 | 2025/01/27 14:54:43 | 2025-01-27 14:54:43INFO | jvm 1 | 2025/01/27 14:54:43 | Full thread dump OpenJDK 64-Bit Server VM (11.0.23+10-LTS mixed mode):INFO | jvm 1 | 2025/01/28 14:05:05 | HeapINFO | jvm 1 | 2025/01/28 14:05:05 | par new generation total 157248K, used 157173K [0x0000XXXXXXXXXXXX, 0x0000XXXXXXXXXXXX, 0x0000XXXXXXXXXXXX)INFO | jvm 1 | 2025/01/28 14:05:05 | eden space 139776K, 99% used [0x0000XXXXXXXXXXXX, 0x0000XXXXXXXXXXXX, 0x0000XXXXXXXXXXXX)INFO | jvm 1 | 2025/01/28 14:05:05 | from space 17472K, 99% used [0x0000XXXXXXXXXXXX, 0x0000XXXXXXXXXXXX, 0x0000XXXXXXXXXXXX)INFO | jvm 1 | 2025/01/28 14:05:05 | to space 17472K, 0% used [0x0000XXXXXXXXXXXX, 0x0000XXXXXXXXXXXX, 0x0000XXXXXXXXXXXX)INFO | jvm 1 | 2025/01/28 14:05:05 | concurrent mark-sweep generation total 349568K, used 349568K [0x0000XXXXXXXXXXXX, 0x0000XXXXXXXXXXXX, 0x0000XXXXXXXXXXXX)INFO | jvm 1 | 2025/01/28 14:05:05 | Metaspace used 222489K, capacity 225485K, committed 228392K, reserved 1247232KINFO | jvm 1 | 2025/01/28 14:05:05 | class space used 28316K, capacity 29406K, committed 30304K, reserved 1048576K
There may also be new core dumps generated on the Upgrade Coordinator node under /image/cores for core.uc_oom.<random_numbers>.hprof
When trying to dump the BaseHostSwitchProfile corfu table using the following command, there could also be an out-of-memory error:
corfu_tool_runner.py -n nsx -t BaseHostSwitchProfile -o showTable
VMware NSX
This is caused by a large quantity of UplinkHSProfiles in the Corfu database. These uplink profiles are not used, but are automatically created by a security-only workflow. The large number of entries in this database table contributes to a memory leak when the upgrade pre-check reads the table.
This issue will be fixed in a future NSX version.
If you believe you have encountered this issue and are unable to upgrade, please open a support case with Broadcom Support and refer to this KB article.
For more information, see Creating and managing Broadcom support cases.