NSX overlay tunnels are down on transport nodes
search cancel

NSX overlay tunnels are down on transport nodes

book

Article ID: 413090

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

NSX overlay tunnels are down without any configuration changes and no  known issue on underlay physical network infrastructure. This issue may cause datapath issue and VMs might not be able to communication on NSX overlay network.

Environment

  • VMware NSX 4.2.0.x
  • VMware NSX 4.2.1.3 and earlier
  • VMware NSX 9.0.0

Cause

The issue occurs due to a JDK bug. For more information please see the following link: NSX is Impacted by JDK-8330017: ForkJoinPool Stops Executing Tasks Due to ctl Field Release Count (RC) Overflow

Resolution

This issue is resolved in VMware NSX 4.2.1.4 and 4.2.2 and above, available at Broadcom downloads. If having difficulty finding and downloading software, please review the Download Broadcom products and software KB.

Broadcom recommends a rolling reboot of NSX Managers prior to upgrading to a fixed release version to avoid potential problems associated with this issue.

For environments running affected versions (see "Environment" section), implement a preventative monthly rolling reboot schedule:

  1. Reboot the first NSX Manager.
  2. SSH to a Manager as admin user and check cluster health: get cluster status
  3. When all services report up on all 3 NSX Manager nodes, reboot the next Manager.
  4. Repeat steps 2-3 for the third Manager.

In situations where tunnels on transport nodes are down due to this error and remain down following a rolling reboot of the managers, services on the transport nodes may need to be restarted in order to restore the tunnels. Please execute the following on affected hosts:

/etc/init.d/nsx-opsagent restart  
/etc/init.d/nsx-proxy restart  


Note: If experiencing this issue currently, restarting the affected service or rebooting the affected NSX Manager node resolves the immediate symptoms. However, without upgrading NSX (to a version where this issue is resolved), the problem will recur over time.