NSX-T UI inaccessible due to large corfu table LoginLogoutEvent
search cancel

NSX-T UI inaccessible due to large corfu table LoginLogoutEvent

book

Article ID: 318277

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

  • You are unable to log into the NSX-T UI and receive the following alert message:

    "Some appliance components are not functioning properly. Component health: POLICY:UNKNOWN, MANAGER:UNKNOWN, SEARCH:UNKNOWN, NO"
  • You have Identity Firewall (IDFW) configured or had configured in the past.
  • As root user, running df -h, we see the partition /config is increasing in size, 50% or more, normally this should only be around 1%.
  • There are a lot of core dumps found in: /image/core/, these will be *.hprof files.
  • There will be one core dump called cbm_oom.hprof.gz this file is rotating every minute.
  • In the manager log /var/log/corfu/corfu-compactor-audit.log we see message like the following:

    java.lang.OutOfMemoryError: Java heap space
    Dumping heap to /image/core/java_pid3360.hprof ...
    Heap dump file created [2718867198 bytes in 13.856 secs]
    Aborting due to java.lang.OutOfMemoryError: Java heap space

Environment

VMware NSX-T Data Center 3.0.x
VMware NSX-T Data Center 3.1.x

Cause

There is a table called LoginLogoutEvent, this is used to monitor login and out events for the IDFW feature.
This table can grow very large and at a certain stage, over 4 million records, it will be too big for the compactor process to work on and cause compactor to crash.
We can check the table see in the log file /var/log/corfu/corfu-compactor-audit.log we see the below entry for the table 3e6e9124-####-####-####-58ac1bcebd3d, note this is the internal UUID used for this LoginLogoutEvent table:

2021-10-01T07:10:02.856Z INFO main CheckpointWriter - appendCheckpoint: completed checkpoint for 3e6e9124-####-####-####-58ac1bcebd3d, entries(4761324), cpSize(2950812316) bytes at snapshot Token(epoch=3968, sequence=2172928084) in 5079209 ms

Note: You can see above the table has 4,761,324 entries, for the last successful run of compactor on this table.
The Cluster Boot Manager (CBM) process is used to start compactor, as compactor is crashing, it in turn causes cbm to crash and therefore the reason you find the cbm_oom.hprof.gz dump also.

Resolution

This issue is resolved in VMware NSX 3.2.2
This issue is resolved in VMware NSX 4.0.1

Workaround:

If you believe you have encountered this issue, contact Broadcom Support.