esxcli or df -h on the host may hang or take several minutes to respond.
VMware vSphere ESXi 8.0
Packet drops on NVMe/TCP uplinks cause storage to repeatedly disconnect and reconnect, leading to high latency and hostd issues.
hostd.log indicates the daemon is stuck/taking excessive time on trying to read VMs/threads from the storage:
YYYY-MM-DDTHH:MM:SS.###Z warning hostd[<pid>] [Originator@6876 sub=IoTracker] In thread <thread id>, stat("/vmfs/volumes/########-########-####-############/path") took over 2714 sec.YYYY-MM-DDTHH:MM:SS.###Z warning hostd[<pid>] [Originator@6876 sub=IoTracker] In thread <thread id>, open("/vmfs/volumes/########-########-####-############/path") took over 2814 sec.YYYY-MM-DDTHH:MM:SS.###Z warning hostd[<pid>] [Originator@6876 sub=IoTracker] In thread <thread id>, access("/vmfs/volumes/########-########-####-############/path") took over 4049 sec.
YYYY-MM-DDTHH:MM:SS.###Z Wa(180) vmkwarning: cpu21:2098710)WARNING: NVMEIO:3649 Ctlr 263, nvmeCmd 0x45bade2e3800 (opc 02), queue 1 (expect 65535) not available, nvmeStatus 80eYYYY-MM-DDTHH:MM:SS.###Z Wa(180) vmkwarning: cpu21:2098710)WARNING: NVMEIO:3649 Ctlr 263, nvmeCmd 0x45bade2e3800 (opc 02), queue 2 (expect 65535) not available, nvmeStatus 80eYYYY-MM-DDTHH:MM:SS.###Z Wa(180) vmkwarning: cpu21:2098710)WARNING: NVMEIO:3649 Ctlr 263, nvmeCmd 0x45bade2e3800 (opc 02), queue 3 (expect 65535) not available, nvmeStatus 80eYYYY-MM-DDTHH:MM:SS.###Z Wa(180) vmkwarning: cpu21:2098710)WARNING: NVMEIO:3649 Ctlr 263, nvmeCmd 0x45bade2e3800 (opc 02), queue 4 (expect 65535) not available, nvmeStatus 80eYYYY-MM-DDTHH:MM:SS.###Z Wa(180) vmkwarning: cpu21:2098710)WARNING: NVMEIO:3649 Ctlr 263, nvmeCmd 0x45bade2e3800 (opc 02), queue 5 (expect 65535) not available, nvmeStatus 80e
Note: The above excerpts are an example and opcodes/error sequence may vary.
esxcli network nic stats get -n vmnic against the vmnic used for NVMe/TCP:NIC statistics for vmnic#: Packets received: 158585149637 Packets sent: 48571680247 Bytes received: 234866890345430 Bytes sent: 49571177363751 Receive packets dropped: 22671 Transmit packets dropped: 0 Multicast packets received: 2745626351 Broadcast packets received: 9117019440 Multicast packets sent: 8172746 Broadcast packets sent: 2139484 Total receive errors: 5909Note: In a healthy environment, errors should be zero or statistically negligible relative to the total.
Engage the hardware vendor to investigate the physical network adapter (NIC) errors. Because these errors occur at the hardware level and are merely passed up to the ESXi host, vendor assistance is required to determine the root cause.
Read more on, Troubleshooting and understanding physical NIC receive or transmit dropped, missed and error counters in ESXi