ESXi hosts fails with PSOD and "PANIC bora/vmkernel/main/dlmalloc.c:4937 - Usage error in dlmalloc"
search cancel

ESXi hosts fails with PSOD and "PANIC bora/vmkernel/main/dlmalloc.c:4937 - Usage error in dlmalloc"

book

Article ID: 323555

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:

- NSX version is 4.0.x,
- or NSX-T version is lower than 3.2.3. 
- ESXi host may fail with a Purple Screen of Death (PSOD) and "PANIC bora/vmkernel/main/dlmalloc.c:4937 - Usage error in dlmalloc"
- There is more than 64 Edge Transport Nodes running on the ESXi host.
- Backtrace is similar to:
PANIC bora/vmkernel/main/dlmalloc.c:4937 - Usage error in dlmalloc
PanicvPanicInt@vmkernel#nover+0x327 stack: 0x453a4f89b9b8, 0x0, 0x42002d0fefe3, 0x431242001300, 0x453a4f89b8e0
Panic_NoSave@vmkernel#nover+0x4d stack: 0x453a4f89ba10, 0x453a4f89b9d0, 0x208, 0x42002d6a9053, 0x1349
DLM_free@vmkernel#nover+0x22d stack: 0x431242001310, 0x42002d143afa, 0x3105ded66af, 0x453a4f89baf0, 0x3105ded66af
Heap_Free@vmkernel#nover+0xba stack: 0x3105ded66af, 0x3b, 0x0, 0x3105ded66af, 0x431242001220
[email protected]#1.1.7.0.21487563+0x124 stack: 0x10, 0x0, 0x10, 0x0, 0x0
VMKAPICharDevDevfsWrapIoctl@vmkernel#nover+0x87 stack: 0x5c, 0x42002ed819f8, 0x42002d116991, 0x0, 0x2
CharDriverIoctl@vmkernel#nover+0x7d stack: 0x430e6320e6a0, 0x430e63219920, 0x430c934f0780, 0x430c934f0780, 0x453a4f89be23
DevFSIoctl@vmkernel#nover+0xad3 stack: 0x43120661bd50, 0x430a95414c50, 0x2, 0x0, 0x43110000003b
FSSVec_Ioctl@vmkernel#nover+0x20 stack: 0x9, 0x42002d4b9105, 0x100, 0x400, 0x3
FSSObjectIoctlCommon@vmkernel#nover+0x60 stack: 0x100, 0x400, 0x3, 0x100, 0xa
FSS_IoctlByFH@vmkernel#nover+0x9f stack: 0x0, 0x3105ded66af, 0x3b, 0x3105ded66af, 0x0
UserFile_PassthroughIoctl@vmkernel#nover+0x3f stack: 0x420053c00000, 0x42002d4d7f67, 0x433478409280, 0x433478409280, 0x453a4f89f140
UserVmfs_Ioctl@vmkernel#nover+0x27 stack: 0x453a4f89f140, 0x0, 0x453a4f89bf40, 0xe, 0x453a4f89f000
LinuxFileDesc_Ioctl@vmkernel#nover+0x51 stack: 0x453a4f89bf40, 0x10, 0x1, 0x42002d4b4864, 0xffffffffffffffef
User_LinuxSyscallHandler@vmkernel#nover+0x1a4 stack: 0x0, 0x0, 0x0, 0x42002d14e068, 0x10b
gate_entry@vmkernel#nover+0x68 stack: 0x0, 0x10, 0x416a86e037, 0x3105ded66af, 0x3


Environment

VMware NSX-T Data Center

Cause

This is caused by net-vdl2 command run on the ESXi host, usually run during collection of diagnostic log bundle.

Resolution

This is a known issue, fixed in NSX 4.1 and higher, and in NSX-T Data Center 3.2.3 and higher. 


Workaround:

Ensure that there is never more than 64 Edge Transport Nodes running on the ESXi host at any time.
More than 64 Edge Transport Nodes per ESXi host can cause CPU/memory/storage contention, causing performance issues on the ESXi and also performance issues the Edge VMs running on the host.


Additional Information

Impact/Risks:

Downtime on the ESXi host and failover of VMs (considering HA is enabled and allowed to restart the VMs on another host in the cluster).