Virtual Machine running on NVMe backed datastores become unresponsive
watchdog: BUG: soft lockup" errors in the kernel ring buffer.VMware vSphere ESXi 8.x
VMware vSphere ESXi 9.x
The unresponsiveness is caused by I/O stalls and aborts where NVMe controllers cannot process I/O and reports to ESXi that it is entering recovery mode.
/var/log/vmkwarning.log reports:
WARNING: NVMEIO:4346 Controller ### in state 8 or in recovery mode, bail out.
During this state transition, outstanding I/O operations are blocked or delayed beyond the guest operating system's internal watchdog thresholds.
ESXi may report:
To resolve this issue, the following actions must be taken: