ESXi Host Crashes with PSOD Due to Deadlock Issue : FC_InvalidateFlows@com.vmware.nsx.fc
search cancel

ESXi Host Crashes with PSOD Due to Deadlock Issue : [email protected]

book

Article ID: 387519

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware NSX

Issue/Introduction

An ESXi host crashes with a Purple Screen of Death (PSOD). The backtrace from the PSOD shows a deadlock issue between FC and MAC learning lock, as highlighted in the logs: /var/log/vmkernel.log

PSOD Backtrace:

    0x453a2859bbe0:[0x420013d1f53e]MCSLockSpin@vmkernel#nover+0x47 stack: 0x4302f858cd60
    0x453a2859bc10:[0x420013d1f736]MCSLockRWContended@vmkernel#nover+0x1bb stack: 0x6aef87d95dbcc4
    0x453a2859bc60:[0x420013d1fead]MCS_DoAcqReadLockWithRA@vmkernel#nover+0x82 stack: 0x2
    0x453a2859bc70:[0x420013d1d2e0]RefCount_ReaderWait@vmkernel#nover+0x51 stack: 0x2
    0x453a2859bca0:[0x420013ea49e7]Port_AcquireNonexcl@vmkernel#nover+0x3f8 stack: 0x453a6189f8c0
    0x453a2859bd10:[0x420013edf285]vmkGetPortByIDNonExclLock@vmkernel#nover+0xa2 stack: 0x3
    0x453a2859bd40:[0x420013ee1967]vmk_PortGetClientType@vmkernel#nover+0x20 stack: 0x40007dd
    0x453a2859bd70:[0x420015d33ccd]Ens_InvalidateFlows@(nsxt-ens-22667792)#+0x2be stack: 0x40007dd
    0x453a2859bdb0:[0x4200156e95b2][email protected]#1.1.8.0.22667792+0x423 stack: 0x134e4a9
    0x453a2859be10:[0x4200157469d8][email protected]#1.0.8.0.22667792+0x7ed stack: 0xd3220
    0x453a2859be80:[0x420015747ef7][email protected]#1.0.8.0.22667792+0xd4 stack: 0x0

 

Environment

ESXi Host 8.0.2

VMware NSX 4.1.x

Cause

This issue is caused by a deadlock between two processes: FC (Fibre Channel) attempting to acquire a port lock while the MAC learning lock is already being held. This leads to a lock contention, resulting in the PSOD.

The deadlock issue between FC and MAC learning lock is documented in the release notes for VMware NSX. The issue is addressed in newer versions of NSX.

 

Resolution

 

  • Upgrade to NSX version 4.2.0 to resolve the deadlock issue.
  • Once the upgrade is complete, monitor the ESXi host to confirm that the PSOD issue no longer occurs.