nmlx5_QueryNicVportContext:188 command failed: IO was aborted<NMLX_ERR> nmlx5_core: 0000:45:00.0: Health: Miss counters detected<NMLX_INF> synd 0x0: unrecognized error<NMLX_INF> extSynd 0x0000<NMLX_ERR> nmlx5_QueryNicVportContext:188 command failed: IO was aborted<NMLX_ERR> nmlx5_QueryVportCounter:1851 command failed: IO was abortedvSphere ESXi 8.0.x
This is a known issue in the nmlx5 health check logic, where the driver incorrectly detects the NIC is in faulty state, even though the NIC firmware is healthy. Driver will then suspend all I/O on the vmnic from the driver side.
This issue is resolved in VMware ESXi 8.0U3e (nmlx5_core driver version: 4.23.6.5) and also in the inbox driver for VCF 9.0 (nmlx5_core version: 4.24.0.7).
Reference KB Download Broadcom products and software for guidance on how to navigate and download from the Broadcom download portal.
Currently, there is no workaround to avoid or workaround this condition. Once it occurs, rebooting the ESXi host is needed to recover the uplink.
If the error code on the ESXi - /var/log/vmkernel.log is extSynd 0x8a02, it indicates that the commands from the driver to the firmware are failing. The issue is at the hardware/firmware layer and it needs to be checked further by the NIC vendor.
<NMLX_ERR> nmlx5_core: 0000:c1:00.0: Health: Miss counters detected
<NMLX_INF> Device internal error state is set
<NMLX_INF> assertVar[0] 0x00000000
<NMLX_INF> assertVar[1] 0x00000000
<NMLX_INF> assertVar[2] 0x00000000
<NMLX_INF> assertVar[3] 0x00000000
<NMLX_INF> assertVar[4] 0x00000000
<NMLX_INF> assertExitPtr 0x20a37df8
<NMLX_INF> assertCallra 0x20a3ebcc
<NMLX_INF> firmwareVersion 0x1a2903e9
<NMLX_INF> hwId 0x00000216
<NMLX_INF> iriscIndex 6
<NMLX_INF> synd 0x1: firmware internal error
<NMLX_INF> extSynd 0x8a02
<NMLX_INF> driver 4.23.6.5
<NMLX_INF> nmlx5_core: 0000:c1:00.0: Health: thread is stopped 0x43199284db88
<NMLX_WRN> nmlx5_core: vmnic1: nmlx5_en_UpdateStatsWork - (nmlx5_core_en_main.c:1882) Device internal error state is set! Stop updating