Hosts may go to not responding state when connected to an iscsi datastore due to Task mgmt getting stuck and heartbeat timeouts as a result of a race condition iscsi_vmk adapter.
search cancel

Hosts may go to not responding state when connected to an iscsi datastore due to Task mgmt getting stuck and heartbeat timeouts as a result of a race condition iscsi_vmk adapter.

book

Article ID: 323093

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This KB is for awareness on this issue, this impacts ESXi hosts on multiple versions, which are connected to iscsi storage via the vmk_iscsi software driver to storage devices that are being presented from multiple iscsi storage vendors.


Symptoms:

Esxi Hosts will enter a not responding state due to underlying storage issues.
While taking a closer look at the logs you may see Heartbeat timeouts may occur on a vmfs datastore connected to the hosts through software based vmk_iscsi adapter.
These timeouts will continue for a long time without any indication of scsi sense code that indicate an ongoing issue, these "Waiting for timed out HB" messages seen in vmkernel will continue for an extended period of time.

If the ESXi host is running 6.7 or higher you may see indications of a Task mgmt command is stuck.

For example:
"Task mgmt request issued to device naa.60002ac000000000000000180007eb79 is stuck"
 


Environment

VMware vSphere ESXi 6.7
VMware vSphere ESXi 6.5
VMware vSphere ESXi 7.0.0

Cause

Due to a race condition inside iscsi_vmk driver, commands and the corresponsing task mgmt process can get stuck at the driver level. 

Resolution

This issue is resolved in vSphere ESXi 7.0 U3f (build number 20036589).
This issue is resolved in vSphere ESXi 6.7 P07 (build number 19898906).
This issue is resolved in vSphere ESXi 6.5 P05 (build number 16576891).


Additional Information

Impact/Risks:

ESXi hosts enters a 'not responding' state after Switch upgrades or Storage node reboots.
Once a host enters a not responding state you will lose manageability over the Host and the vms which reside within.