Virtual machines freeze intermittently or goes unresponsive under heavy I/O load

search cancel

Virtual machines freeze intermittently or goes unresponsive under heavy I/O load

book

Article ID: 327867

calendar_today

Updated On: 04-09-2025

Products

VMware vSphere ESXi

Issue/Introduction

Running the command on the ESXi host with the affected virtual machine, returns similar to:

$ ps -s | grep <vm-name>

4313969 vmm0: vm-name COSTOP NONE 0-63
4313971 vmm1:vm-name WAIT SCSI 0-63
4313972 4313957 vmx-vthread-5:vm-name WAIT UFUTEX 0-63 /bin/vmx
4314204 4313957 vmx-vthread-6:vm-name WAIT UFUTEX 0-63 /bin/vmx
4314205 4313957 vmx-vthread-7:vm-name WAIT UFUTEX 0-63 /bin/vmx
4314206 4313957 vmx-vthread-8:vm-name WAIT UFUTEX 0-63 /bin/vmx
4314210 4313957 vmx-mks:vm-name WAIT UPOL 0-63 /bin/vmx
4314212 4313957 vmx-svga:vm-name WAIT SEMA 0-63 /bin/vmx
4314214 4313957 vmx-vcpu-0:vm-name COSTOP NONE 0-63 /bin/vmx
4314215 4313957 vmx-vcpu-1:vm-name WAIT SCSI 0-63 /bin/vmx

Note: The vmm1 is blocked on a SCSI call (WAIT SCSI).

The following error may appear:

Unable to connect to the MKS: Error connecting to /bin/vmx process.

Virtual machines are unreachable over the network
Virtual machines may report an invalid state
Virtual machines are unresponsive

Environment

VMware vCenter Server 6.5.x
VMware vCenter Server 6.7.x
VMware vCenter Server 7.x
VMware vCenter Server 8.x

Cause

A virtual machine can be unresponsive due to:

Taking quiesced snapshots or using a custom quiescing script
Heavy I/O load on the ESXi hosts
Storage performance issues at the device, storage pool, and/or LUN level
One of the Virtual Machine Monitor (VMM) threads is blocked on a VSCSI call, the other VMM threads are co-stopped, waiting for the blocked thread to make progress

Resolution

Workaround

Caution: Ensure that there are no snapshot consolidation tasks running. Ensure no backups are running on the VMs during this time.

To recover the virtual machine from its locked state:

Find the process list for the virtual machine and check the cartel ID:

$ ps -s | grep <vm-name>

Note: Refer to the ps -s output mentioned in the Issue/Introduction section of this article.

Find the vmx-vcpu value that is waiting on SCSI event.

Note: The number in the second column of the output is the cartel ID.

Run:$ kill -18 <cartel-ID>

to continue the process that has stopped.
After running the above steps the virtual machine may need to be reloaded. For more information see Reloading a vmx file without removing the virtual machine from inventory

Additional Information

Feedback

thumb_up Yes

thumb_down No