NSX Manager Federation Upgrade is unable to complete due to Corfu compaction failure for Alarm Table
search cancel

NSX Manager Federation Upgrade is unable to complete due to Corfu compaction failure for Alarm Table

book

Article ID: 336797

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX Manager is unable to connect to the datastore
  • In NSX Global Manager logs /var/log/gmanager/gmanager.log, you see similar entries to

2022-07-14T18:05:06.429Z WARN pool-32-thread-1 DataStoreDisconnectHandler 26600 - [nsx@6876 comp="global-manager" level="WARNING" subcomp="global-manager"] Disconnected from the database, restarting the service
2022-07-14T18:05:06.429Z INFO pool-32-thread-1 ContainerConfigServiceImpl 26600 - [nsx@6876 comp="global-manager" level="INFO" subcomp="global-manager"] Restart application after 0 ms.
2022-07-14T18:05:06.719Z ERROR localhost-startStop-1 CorfuRuntime 26600 connect: Couldn't connect to server. java.util.concurrent.TimeoutException: null

  • In NSX Global Manager logs /var/log/corfu/corfu.9000.log, you see similar entries to 

2022-07-14T21:19:00.213Z | WARN | worker-0 | i.n.c.DefaultChannelPipeline | An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.

java.nio.file.FileSystemException: /config/cluster-manager/corfu/private/keystore.password: Too many open files

  • In NSX Global Manager logs /var/log/corfu/corfu-compactor-audit.log, for Corfu Compactor Out of Memory Error you will see the similar entries to

 java.lang.OutOfMemoryError: Java heap space
 -XX:OnOutOfMemoryError="gzip -f /image/core/compactor_oom.hprof"
   Executing /bin/sh -c "gzip -f /image/core/compactor_oom.hprof"...
 Aborting due to java.lang.OutOfMemoryError: Java heap space

 A fatal error has been detected by the Java Runtime Environment:

  INVALID (0xe0000000) at pc=0x0000000000000000, pid=14350, tid=0x000075b304a11700
  fatal error: OutOfMemory encountered: Java heap space


 JRE version: OpenJDK Runtime Environment (Zulu 8.55.0.14-SA-linux64) (8.0_301-b02) (build 1.8.0_301-b02)
 Java VM: OpenJDK 64-Bit Server VM (25.301-b02 mixed mode linux-amd64 compressed oops)
 Core dump written. Default location: //core or core.14350

Environment

VMware NSX 3.2.0

Cause

In 3.2.0 there is GPRR (GenericPolicyRealizedResource) and the issue occurs when GPRR doesn't have a realized object ID.

Resolution

This issue is fixed in version 3.2.2


Workaround:

If you believe you have encountered this issue, please open a support case with Broadcom Support and refer to this KB article.

For more information, see Creating and managing Broadcom support cases.



Additional Information

Impact/Risks:
Unable to continue with NSX Manager upgrade