NSX is Impacted by JDK-8330017: ForkJoinPool Stops Executing Tasks Due to ctl Field Release Count (RC) Overflow
search cancel

NSX is Impacted by JDK-8330017: ForkJoinPool Stops Executing Tasks Due to ctl Field Release Count (RC) Overflow

book

Article ID: 396719

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

VMware NSX 4.2.0.x and 4.2.1.x are affected by a critical JDK bug (JDK-8330017) where Java ForkJoinPool incorrectly determines the total number of threads as over the limit, causing new thread requests to be blocked. This results in NSX services' transaction processing threads becoming unresponsive.

When this occurs, one or more of these symptoms appear:


NSX Upgrade JDK pre-check warning - NSX Manager reboot required

Environment

VMware NSX 4.2.0.x 
VMware NSX 4.2.1.x 

Cause

The issue occurs due to a JDK bug (JDK-8330017) where the Release Count (RC) field in ForkJoinPool's internal control structure overflows. The RC value keeps decreasing until it reaches -32768, then overflows to +32767 (ForkJoinPool.MAX_CAP), causing the thread pool to stop executing tasks.

This affects different NSX services:

  • Controller service - impacts network provisioning, firewall rules, and vMotion operations
  • Upgrade Coordinator service - affects upgrade operations and causes OOM errors
  • Corfu service - impacts data storage and retrieval operations

The issue accumulates over time and becomes apparent during configuration changes (upgrades, VM migrations) or when memory limits are reached.

Resolution

This issue is resolved in VMware NSX 4.2.1.4 and 4.2.2 and above, available at Broadcom downloads. If having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Broadcom recommends a rolling reboot of NSX Managers prior to upgrading to a fixed release version to avoid potential problems associated with this issue.

For environments running affected versions (4.2.0.x or 4.2.1.x), implement a preventative monthly rolling reboot schedule:

  1. Reboot the first NSX Manager.
  2. SSH to a Manager as admin user and check cluster health: get cluster status
  3. When all services report up on all 3 NSX Manager nodes, reboot the next Manager.
  4. Repeat steps 2-3 for the third Manager.

Note: If experiencing this issue currently, restarting the affected service or rebooting the affected NSX Manager node resolves the immediate symptoms. However, without upgrading to NSX 4.2.2, the problem will recur over time.