ESXi host crashes with a PSOD due to multi-destination flow cache issue NSX-T
search cancel

ESXi host crashes with a PSOD due to multi-destination flow cache issue NSX-T

book

Article ID: 318622

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

Symptoms:
  • PSOD occurred with PF Exception 14 in world xxxx:vmnic5-pollW IP
  • In the ESXi var/log/vmkernel.log you will see entries as below
#0 BitVector_NextBit (bv=0x45a57a1ff940, startSearch=17829, state=1 '\001', pos=0x451b1291b9b8) at bora/lib/misc/bitvector.c:256
#1 0x000041802da39818 in vmk_BitVectorNextBit (vmkbv=<optimized out>, startSearch=<optimized out>, state=state@entry=1 '\001', pos=pos@entry=0x451b1291b9b8) at bora/vmkernel/core/vmkapi_bitvector.c:136
#2 0x000041802f42715b in FCPortsBitmapGet (nthSetBit=1, portsBitmap=0x45a57a9fe8c0) at /build/mts/release/bora-15314311/nsx/datapath/esx/modules/fc/fc.h:674
#3 FC_ForwardFastpathPktList (ps=0x43063f1a8000, dispatchData=dispatchData@entry=0x45a57a9fe880, nresumed=0x451b1291bb88) at /build/mts/release/bora-15314311/nsx/datapath/esx/modules/fc/fc_datapath.c:583
#4 0x000041802f42a1b1 in FC_LookupInput (portID=<optimized out>, arg=..., pktList=pktList@entry=0x430151444d00, iocHandle=iocHandle@entry=0x4303264fb020)
    at /build/mts/release/bora-15314311/nsx/datapath/esx/modules/fc/fc_datapath.c:1960
#5 0x000041802dbf72fa in IOChain_Resume (port=port@entry=0x43063f1deb00, chain=chain@entry=0x43063f1dedd8, prevLink=prevLink@entry=0x0, pktList=pktList@entry=0x430151444d00,
    remainingPktList=remainingPktList@entry=0x0) at bora/vmkernel/net/iochain.c:825
#6 0x000041802dc34bf1 in Port_InputResume (port=0x43063f1deb00, prev=prev@entry=0x0, pktList=pktList@entry=0x430151444d00) at bora/vmkernel/net/port.c:3602
#7 0x000041802dc34d20 in Port_Input (port=<optimized out>, pktList=pktList@entry=0x430151444d00) at bora/vmkernel/net/port.c:2426
#8 0x000041802dbcef23 in Net_AcceptRxList (dev=0x4306a4256f00, list=0x430151444d00) at bora/vmkernel/net/bh.c:1055
#9 0x000041802dc69072 in vmk_PktListRxProcess (pktlist=<optimized out>, uplink=<optimized out>) at bora/vmkernel/net/vmkapi_net_uplink.c:567
#10 0x000041802dc60360 in NetPollWorldCallback (data=0x430151444c40) at bora/vmkernel/net/vmkapi_net_poll.c:494
#11 0x000041802dd0e323 in CpuSched_StartWorld (destWorld=<optimized out>, previous=<optimized out>) at bora/vmkernel/sched/cpusched.c:11952
#12 0x0000000000000000 in ?? ()


Environment

VMware NSX-T Data Center

Cause

The PSOD happens in the processing of multi-destination packets.

Resolution

The fix for this issue is included in NSX-T versions 2.5.3, 3.0.3, 3.1.3.7, 3.2.0.1 and Higher

Workaround:
To disable multi-destination flow cache

Procedure:

In each ESX host, edit /etc/vmware/nsx/nsx-cfgAgent.xml:
<flowCache>
   <enabled>true</enabled>
   <mcastEnabled>false</mcastEnabled>
</flowCache>


Then restart netcpa service:
/etc/init.d/nsx-cfgagent restart

Additional Information

The flow cache helps to expedite packet handling by caching the packet modification actions and applying them in the subsequent packets in the same stream. The workaround disables the flow cache for broadcast/multicast traffic. These types of traffic will go through regular data path handling without flow cache.