NSX Manager Federation Upgrade is unable to complete due to Corfu compaction failure for Alarm Table

search cancel

NSX Manager Federation Upgrade is unable to complete due to Corfu compaction failure for Alarm Table

book

Article ID: 336797

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

NSX Manager is unable to connect to the datastore
In NSX Global Manager logs /var/log/gmanager/gmanager.log, you see similar entries to

2022-07-14T18:05:06.429Z WARN pool-32-thread-1 DataStoreDisconnectHandler 26600 - [nsx@6876 comp="global-manager" level="WARNING" subcomp="global-manager"] Disconnected from the database, restarting the service
2022-07-14T18:05:06.429Z INFO pool-32-thread-1 ContainerConfigServiceImpl 26600 - [nsx@6876 comp="global-manager" level="INFO" subcomp="global-manager"] Restart application after 0 ms.
2022-07-14T18:05:06.719Z ERROR localhost-startStop-1 CorfuRuntime 26600 connect: Couldn't connect to server. java.util.concurrent.TimeoutException: null

In NSX Global Manager logs /var/log/corfu/corfu.9000.log, you see similar entries to

2022-07-14T21:19:00.213Z | WARN | worker-0 | i.n.c.DefaultChannelPipeline | An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.

java.nio.file.FileSystemException: /config/cluster-manager/corfu/private/keystore.password: Too many open files

In NSX Global Manager logs /var/log/corfu/corfu-compactor-audit.log, for Corfu Compactor Out of Memory Error you will see the similar entries to

java.lang.OutOfMemoryError: Java heap space
-XX:OnOutOfMemoryError="gzip -f /image/core/compactor_oom.hprof"
Executing /bin/sh -c "gzip -f /image/core/compactor_oom.hprof"...
Aborting due to java.lang.OutOfMemoryError: Java heap space

A fatal error has been detected by the Java Runtime Environment:

INVALID (0xe0000000) at pc=0x0000000000000000, pid=14350, tid=0x000075b304a11700
fatal error: OutOfMemory encountered: Java heap space

JRE version: OpenJDK Runtime Environment (Zulu 8.55.0.14-SA-linux64) (8.0_301-b02) (build 1.8.0_301-b02)
Java VM: OpenJDK 64-Bit Server VM (25.301-b02 mixed mode linux-amd64 compressed oops)
Core dump written. Default location: //core or core.14350

Environment

VMware NSX 3.2.0

Cause

In 3.2.0 there is GPRR (GenericPolicyRealizedResource) and the issue occurs when GPRR doesn't have a realized object ID.

Resolution

This issue is fixed in version 3.2.2

Workaround:

If you believe you have encountered this issue, please open a support case with Broadcom Support and refer to this KB article.

For more information, see Creating and managing Broadcom support cases.

Additional Information

Impact/Risks:
Unable to continue with NSX Manager upgrade

Feedback

thumb_up Yes

thumb_down No