Virtual machines might become unresponsive due to a rare deadlock issue in a VMFS6 volume
search cancel

Virtual machines might become unresponsive due to a rare deadlock issue in a VMFS6 volume

book

Article ID: 323079

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • VM(s) randomly become unresponsive when they are using thin VMDK files on VMFS6
  • The /var/log/vmkernel.log is flooded with resetting handle messages that go on indefinitely:
YYYY-MM-DDTHH:MM:SS.653Z cpu57:8916482)VSCSI: 2973: handle 38295998585421404(GID:48732)(vscsi0:0):Added handle (refCnt = 3) to vscsiResetHandleList vscsiResetHandleCount = 1
YYYY-MM-DDTHH:MM:SS.653Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192707
YYYY-MM-DDTHH:MM:SS.653Z cpu14:2097732)VSCSI: 3335: handle 38295998585421404(GID:48732)(vscsi0:0):Reset [Retries: 0/0] from (vmm0:SQLVM1)
YYYY-MM-DDTHH:MM:SS.157Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
YYYY-MM-DDTHH:MM:SS.659Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
YYYY-MM-DDTHH:MM:SS.655Z cpu57:8916482)WARNING: VSCSI: 3967: handle 38295998585421404(GID:48732)(vscsi0:0):WaitForCIF: Issuing reset;  number of CIF:4
YYYY-MM-DDTHH:MM:SS.655Z cpu57:8916482)WARNING: VSCSI: 2986: handle 38295998585421404(GID:48732)(vscsi0:0):Ignoring double reset
YYYY-MM-DDTHH:MM:SS.864Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
YYYY-MM-DDTHH:MM:SS.367Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
YYYY-MM-DDTHH:MM:SS.840Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
YYYY-MM-DDTHH:MM:SS.840Z cpu3:2097732)VSCSI: 3335: handle 38295998585421404(GID:48732)(vscsi0:0):Reset [Retries: 15/0] from (vmm0:SQLVM1)
YYYY-MM-DDTHH:MM:SS.343Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706

Environment

VMware vSphere ESXi 7.x
VMware vSphere 7.0.x

Cause

This is a known issue .
 
In few cases, if a write I/O request runs in parallel with an unmap operation triggered by the guest OS on a thin-provisioned VM, a deadlock might occur in a VMFS6 volume. As a result, the virtual machine may become unresponsive.

Resolution

To understand the cause of this issue live, vmkdump has to be collected and analyzed, Kindly raise a Ticket with Broadcom Support Team.

This issue is resolved in VMware ESXi 7.0 Update 3f.

Workaround:
The thin disks for a VM can be inflated/converted to thick. This will prevent the issuance of UNMAP commands from the GuestOS level and thus there would be no race condition between write I/Os and UNMAP operations.