PSOD reported for hosts when using Mellanox ConnectX-6 network adapter with Enhanced datapath mode.
search cancel

PSOD reported for hosts when using Mellanox ConnectX-6 network adapter with Enhanced datapath mode.

book

Article ID: 426430

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

ESXi hosts experience a Purple Screen of Death (PSOD), which is preceded by intermittent network instability.

PSOD is reported with error:

#GP Exception 13 in world 2102290:EnsNetWorld- @ 0x42000feea3a1

Module(s) involved in panic: [nmlx5_core 4.X.X.X.2-7vmw.803.0.0.24022510 (External)] nsxt-ens-2XXXXXX8 Version X.X.X
ild-XXXXXX]

cpu24:2102290)0x453ab3c1bc30:[0x42000feea3a1]nmlx5_en_EnsStridingRx@(nmlx5_core)#<None>+0x1e5 stack: 0x4328dc3964a4 

Observed symptoms include:

  • Intermittent packet loss to and from the host.
  • Total loss of network connectivity for the host, leading to vMotion failures.
  • The issue typically occurs when multiple uplinks  are active simultaneously.

System logs indicate driver CQE (Completion Queue Entry) errors just before the crash.

Log lines similar to the below are encountered in /var/log/vmkernel.log:

2026-01-01T22:02:32.119Z Wa(180) vmkwarning: cpu110:2102765)WARNING: &lt;NMLX_ERR&gt; 00000000: 000000c7 00000000 00000000 20000099


2026-01-01T22:02:32.119Z In(182) vmkernel: cpu110:2102765)&lt;NMLX_ERR&gt; nmlx5_core: en: nmlx5_HandleErrCqeRq - (nmlx5_core_en_ens_txrx.c:175) RQN in CQE 0x0 does not match RQN of RQ 0xb1


2026-01-01T22:02:32.119Z In(182) vmkernel: cpu110:2102765)&lt;NMLX_ERR&gt; nmlx5_core: en: nmlx5_en_DumpCqe - (nmlx5_core_en_ens_txrx.c:123) Error in CQE, opCode 0xe

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware NSX

Cause

The root cause is a driver-level CQE error within the Mellanox nmlx5_core driver.

Resolution

To resolve this issue, you must disable Hardware Large Receive Offload (HW LRO) at the driver level for the Mellanox nmlx5_core module.

1.Log in to the ESXi host via SSH.

2.Set the module parameter to disable HW LRO (setting data_path_control=1 disables HW LRO in this driver context): 

esxcli system module parameters set -m nmlx5_core -p data_path_control=1 

3. Reboot the host for the changes to take effect.

After the reboot, verify that the module parameter is correctly applied:

esxcli system module parameters list -m nmlx5_core

Additional Information

If you are contacting Broadcom support about this issue, please provide the following:

  • ESXi host support bundles (collected after the PSOD, if possible, or after the network loss occurs).

  • Complete core dump file from the PSOD (if available in the /var/core/ directory or scratch partition).

  • Output of esxcli network nic get -n vmnicX for the affected adapters.

Handling Log Bundles for offline review with Broadcom support: