pktcap-uw --switchport <datapath port id> --capture VnicTx,VnicRx -o - | tcpdump-uw -ner -
esxcli network port stats get -p <datapath port id>
/var/run/log/vmkernel.log you see similar messages about a port being enabled, but repeated messages about failure to acquire a lock:In(182) vmkernel: cpu24:5443269 opID=50ab147b)NetPort: 1618: enabled port <port> with mac <vip mac address>In(182) vmkernel: cpu36:5443272)NetPort: 708: Failed to acquire port non-exclusive lock <port>[Failure].
In(182) vmkernel: cpu24:5443269 opID=50ab147b)NetPort: 3130: blocking traffic on DV port <dv port id>In(182) vmkernel: cpu39:11092449)Net: 3918: IO Completion missed: pkt: 0x45d994228840, IOData: 0xe729e72900270000, srcPortID: <port id>In(182) vmkernel: cpu39:11092449)Log: 1640: Generating backtrace for 11092449: NetWorld-Dev-67108912-TxIn(182) vmkernel: cpu39:11092449)Backtrace for current CPU #39, worldID=11092449, fp=0x45392109f000In(182) vmkernel: cpu39:11092449)0x45392109bad0:[0x42002456ab35]LogBacktraceInt@vmkernel#nover+0xdd stack: 0x45392109bb02, 0x420024ad224c, 0x3030203732206363, 0x3020303020303020, 0x420049002030In(182) vmkernel: cpu39:11092449)0x45392109baf0:[0x420024ad224b]Pkt_LogInfo@vmkernel#nover+0x12c stack: 0x420049002030, 0x43129b487620, 0x45392109bc50, 0x0, 0x45d994228840In(182) vmkernel: cpu39:11092449)0x45392109bb50:[0x420024ad2311]Pkt_ClearAndRelease@vmkernel#nover+0xb6 stack: 0x0, 0x453900000000, 0x45d99233e640, 0x45392109bca0, 0x0In(182) vmkernel: cpu39:11092449)0x45392109bb90:[0x42002466be92]Port_IOCompleteList@vmkernel#nover+0x31b stack: 0x392109bc50, 0x0, 0x4305c7017d80, 0x1164edaca32c38, 0x45392109bc50In(182) vmkernel: cpu39:11092449)0x45392109bc20:[0x420024667d6d]PktList_DoIOCompleteLocked@vmkernel#nover+0xc6 stack: 0x45392871f600, 0x8000000000000000, 0x0, 0x0, 0x33b3f208f6In(182) vmkernel: cpu39:11092449)0x45392109bc90:[0x420024669353]PktList_IOCompleteLocked@vmkernel#nover+0x16c stack: 0x0, 0x0, 0x0, 0x430500000000, 0x0In(182) vmkernel: cpu39:11092449)0x45392109bd00:[0x42002466ce5c]Port_InputResume@vmkernel#nover+0xd1 stack: 0x1, 0x42002464b5c8, 0x4305b5402ac0, 0x4305b5402ac0, 0x45392109be00In(182) vmkernel: cpu39:11092449)0x45392109bd50:[0x4200246b1ea5]Vmxnet3VMKDevTQDoTx@vmkernel#nover+0x23a stack: 0x4305b5402ac0, 0x45394079f000, 0x600000005, 0x0, 0x4305c7017d80In(182) vmkernel: cpu39:11092449)0x45392109bf10:[0x4200246bd966]Vmxnet3VMKDev_AsyncTxPerQ@vmkernel#nover+0xcf stack: 0x100000000, 0x45392109bf88, 0x0, 0x1a00000000, 0x1In(182) vmkernel: cpu39:11092449)0x45392109bf80:[0x420024728a77]NetWorldPerDevCB@vmkernel#nover+0x188 stack: 0x0, 0x0, 0x0, 0x45392109f000, 0x45392871f100In(182) vmkernel: cpu39:11092449)0x45392109bfe0:[0x420024adc88e]CpuSched_StartWorld@vmkernel#nover+0xbf stack: 0x0, 0x420024544fb0, 0x0, 0x0, 0x0In(182) vmkernel: cpu28:5443278)Vmxnet3: 19294: <vm name>.ethX,<vip mac address>, portID<(port id)>: Hang detected,numHangQ: 1, enableGen: 39Avi Load balancer 31.1.1
VMware vSphere ESX 7.X
VMware vSphere ESX 8.X
This is a known issue impacting all versions of ESX since version 7.0 GA. Due to be fixed in a future release.
A workaround is to deploy the older Avi version 30.2.X instead of version 31.1.1.
This issue can potentially affect any VM workload, not just Avi Load balancer, though it is very rare and depends on specific conditions and sequence of events.