ESXi Hosts Running NSX Transformers Experience PSOD with 'Corruption in dlmalloc' Error in Ens_NetQSyncPortUpdates
search cancel

ESXi Hosts Running NSX Transformers Experience PSOD with 'Corruption in dlmalloc' Error in Ens_NetQSyncPortUpdates

book

Article ID: 370208

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

ESXi hosts running NSX Transformers may experience a Purple Screen of Death (PSOD) due to a corruption in the dlmalloc memory allocator.

Symptoms:

  • ESXi hosts display a PSOD with the error message:
    •  "PANIC bora/vmkernel/main/dlmalloc.c:4928 - Corruption in dlmalloc"
  • VMs on the affected hosts may experience:
    • Outages
    • Disruptions
  • The issue occurs on hosts running NSX Transformers 4.1.2.1 or earlier

Environment

  • ESXi hosts running version 8.0 or later
  • NSX Transformers 4.1.2.1 or earlier
  • FP Infra component of NSX Transformers

Those with the issue should see log entries similar to the following:

  • In the vmkernel logs (file path: /var/log/vmkernel.log):
    • "PANIC bora/vmkernel/main/dlmalloc.c:4928 - Corruption in dlmalloc"
    • "Heap_Free (heap=<heap_address>, mem=<memory_address>)"
    • "Ens_NetQRemoveFilters (deadFIDList=<list_address>)"
    • "EnsNetQGenericUpdateQueues (qOpHdl=<handle_address>)"
    • "Ens_NetQSyncPortUpdates (qOpHdl=<handle_address>)"
    • "EnsLBProcPendingOperations (lbHdl=<handle_address>)"
    • "EnsLBMain (arg=<argument_address>)"

  • In the hostd logs (file path: /var/log/hostd.log):
    • "vmkernel: PANIC bora/vmkernel/main/dlmalloc.c:4928 - Corruption in dlmalloc"
    • "vmkernel: Code start: <code_start_address> VMK uptime: <uptime>"
    • "vmkernel: Backtrace for current CPU #<cpu_number>, worldID=<world_id>, fp=<frame_pointer>"

Cause

The PSOD is caused by a corruption in the dlmalloc memory allocator, which is triggered by a race condition in the NSX Transformers FP Infra component. The issue occurs when the Ens_UpdateUplinkLinkState function drops the switch lock to notify the uplink layer, and the subsequent Ens_HandleUplinkReset function tries to access an already freed uplink FCPort pointer.

Resolution

1. Upgrade NSX Transformers to version 4.1.2.4 or later, which includes a fix for this issue:
   a. Download the latest version of NSX Transformers from the official VMware website.
   b. Follow the upgrade instructions provided in the NSX Transformers documentation.
   c. Verify that all ESXi hosts are running the updated version of NSX Transformers.

2. If upgrading is not immediately feasible, apply the following workaround:
   a. Disable the net-stats feature in NSX Transformers.
   b. Contact VMware Support for further assistance in implementing the workaround.

Additional Information

The Purple Screen of Death (PSOD) occurs due to a memory corruption issue in the NSX Transformers software. The problem is caused by a timing issue between two internal functions, which may lead to one function trying to access memory that has already been freed by another function. This situation can cause the ESXi host to crash and display the PSOD error.

The fix for this issue involves making changes to how the affected functions handle memory and validate pointers to prevent the corruption from occurring. Upgrading to NSX Transformers 4.1.2.4 or later ensures that the necessary fixes are in place to prevent this specific PSOD from happening.

If you are experiencing this issue and cannot immediately upgrade, a temporary workaround is available by disabling the net-stats feature in NSX Transformers. For further assistance with implementing the workaround, please contact VMware Support.