Application on NSX Manager Node has crashed. /config partition is increasing.
search cancel

Application on NSX Manager Node has crashed. /config partition is increasing.

book

Article ID: 434354

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

NSX Manager nodes experience an irrecoverable crash loop and OutOfMemoryError (OOM) within the  Proton JVM. The nsxapi logs display the IdfwDbDaemon attempting to start (IdfwDbDaemon starts), followed immediately by a JVM crash before any purge completion or error is logged.

Syslog files on affected nodes will show continuous warnings over an extended period: Leadership lease size is 0 for IdfwDbDaemon

During this period, no events are purged, allowing the LoginLogoutEvent database table to grow unchecked.

Environment

VMware NSX

Cause

Proton service crashes repeatedly due to OutOfMemoryError caused by an excessive accumulation of ~7.8 million records in the nsx$LoginLogoutEvent Corfu table, overwhelming the LoginLogoutProcessor during event processing.

The primary cause is an IdfwDbDaemon singleton leadership election failure following a cluster reboot. The IdfwDbDaemon is a **singleton service** — only one node in the cluster runs it at a time via leadership election. With no node holding the leadership lease, the daemon fails to run and purge old Identity Firewall (IDFW) login/logout events. The LoginLogoutEvent table grows to a critical size. When a node finally acquires leadership, the LoginLogoutCleaner.purge() task executes a findAll() query without a size guard, attempting to load all accumulated millions of entries into the JVM heap. This immediately exceeds the JVM heap limit, triggering a fatal OOM crash loop upon every restart.

Resolution

  • Validate the symptom by checking syslog for continuous Leadership lease size is 0 for IdfwDbDaemon messages following a recent manager node reboot.

  • Review nsxapi logs to confirm the IdfwDbDaemon starts message is repeatedly logged across nodes with different PIDs, immediately followed by JVM termination without logging IdfwDbDaemon completed.

  • Because the LoginLogoutEvent table has exceeded the JVM heap capacity (typically crashing beyond 3-4 million entries), the automated IdfwDbDaemon purge mechanism can no longer recover the system.

  • Contact Broadcom Support and reference this KB to request a manual database intervention to truncate the overgrown LoginLogoutEvent table, which will break the OOM crash loop.

 

Additional Information

NSX is Impacted by JDK-8330017: ForkJoinPool Stops Executing Tasks Due to ctl Field Release Count (RC) Overflow