Virtual machines might become unresponsive due to a rare deadlock issue in a VMFS6 volume
search cancel

Virtual machines might become unresponsive due to a rare deadlock issue in a VMFS6 volume

book

Article ID: 323079

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • VM(s) randomly become unresponsive when they are using thin VMDK files on VMFS6
  • The /var/log/vmkernel.log is flooded with resetting handle messages that go on indefinitely:
2023-04-05T05:01:26.653Z cpu57:8916482)VSCSI: 2973: handle 38295998585421404(GID:48732)(vscsi0:0):Added handle (refCnt = 3) to vscsiResetHandleList vscsiResetHandleCount = 1
2023-04-05T05:01:26.653Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192707
2023-04-05T05:01:26.653Z cpu14:2097732)VSCSI: 3335: handle 38295998585421404(GID:48732)(vscsi0:0):Reset [Retries: 0/0] from (vmm0:SQLVM1)
2023-04-05T05:01:27.157Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
2023-04-05T05:01:27.659Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
2023-04-05T05:01:28.161Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
2023-04-05T05:01:28.663Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
2023-04-05T05:01:29.165Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
2023-04-05T05:01:29.655Z cpu57:8916482)WARNING: VSCSI: 3967: handle 38295998585421404(GID:48732)(vscsi0:0):WaitForCIF: Issuing reset;  number of CIF:4
2023-04-05T05:01:29.655Z cpu57:8916482)WARNING: VSCSI: 2986: handle 38295998585421404(GID:48732)(vscsi0:0):Ignoring double reset
<snip>
2023-04-05T05:08:56.864Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
2023-04-05T05:08:57.367Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
2023-04-05T05:08:57.840Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
2023-04-05T05:08:57.840Z cpu3:2097732)VSCSI: 3335: handle 38295998585421404(GID:48732)(vscsi0:0):Reset [Retries: 15/0] from (vmm0:SQLVM1)
2023-04-05T05:08:58.343Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
2023-04-05T05:08:58.845Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
2023-04-05T05:08:59.347Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706
2023-04-05T05:08:59.847Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706


Environment

VMware vSphere ESXi 7.x
VMware vSphere 7.0.x

Cause

 
 
To understand the cause of this issue live vmkdump has to be collected and analyzed, Kindly raise a Ticket with VMware Support Team.
 
In rare cases, if a write I/O request runs in parallel with an unmap operation triggered by the guest OS on a thin-provisioned VM, a deadlock might occur in a VMFS6 volume. As a result, the virtual machine may become unresponsive.

Resolution

This is a known issue that is resolved in ESXi 7.0 U3f. Please see the release notes: 

https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-70u3f-release-notes.html
 


Workaround:
To workaround this issue, the thin disks for a VM can be inflated/converted to thick. This will prevent the issuance of UNMAP commands from the GuestOS level and thus there would be no race condition between write I/Os and UNMAP operations.