Virtual machines freeze intermittently or goes unresponsive under heavy I/O load
search cancel

Virtual machines freeze intermittently or goes unresponsive under heavy I/O load

book

Article ID: 327867

calendar_today

Updated On: 03-31-2025

Products

VMware vSphere ESXi

Issue/Introduction

  • Running the command on the ESXi host with the affected virtual machine, returns similar to:

    $ ps -s | grep <vm-name>

4313969 vmm0: vm-name COSTOP NONE 0-63
4313971 vmm1:vm-name WAIT SCSI 0-63
4313972 4313957 vmx-vthread-5:vm-name WAIT UFUTEX 0-63 /bin/vmx
4314204 4313957 vmx-vthread-6:vm-name WAIT UFUTEX 0-63 /bin/vmx
4314205 4313957 vmx-vthread-7:vm-name WAIT UFUTEX 0-63 /bin/vmx
4314206 4313957 vmx-vthread-8:vm-name WAIT UFUTEX 0-63 /bin/vmx
4314210 4313957 vmx-mks:vm-name WAIT UPOL 0-63 /bin/vmx
4314212 4313957 vmx-svga:vm-name WAIT SEMA 0-63 /bin/vmx
4314214 4313957 vmx-vcpu-0:vm-name COSTOP NONE 0-63 /bin/vmx
4314215 4313957 vmx-vcpu-1:vm-name WAIT SCSI 0-63 /bin/vmx

Note: The vmm1 is blocked on a SCSI call (WAIT SCSI).
  • The following error may appear:
Unable to connect to the MKS: Error connecting to /bin/vmx process.
  • Virtual machines are unreachable over the network
  • Virtual machines may report an invalid state
  • Virtual machines are unresponsive

Environment

VMware vCenter Server 6.5.x
VMware vCenter Server 6.7.x
VMware vCenter Server 7.x
VMware vCenter Server 8.x

Cause

A virtual machine can be unresponsive due to:
  • Taking quiesced snapshots or using a custom quiescing script
  • Heavy I/O load on the ESXi hosts
  • Storage performance issues at the device, storage pool, and/or LUN level
  • One of the Virtual Machine Monitor (VMM) threads is blocked on a VSCSI call, the other VMM threads are co-stopped, waiting for the blocked thread to make progress

Resolution

Workaround

Caution: Ensure that there are no snapshot consolidation tasks running. Ensure no backups are running on the VMs during this time. 

To recover the virtual machine from its locked state:
  1. Find the process list for the virtual machine and check the cartel ID:
$ ps -s | grep <vm-name>
 
Note: Refer to the ps -s output mentioned in the Issue/Introduction section of this article.
  1. Find the vmx-vcpu value that is waiting on SCSI event.
Note: The number in the second column of the output is the cartel ID.
  1. Run:

    $ kill -18 <cartel-ID>


    to continue the process that has stopped.

  2. After running the above steps the virtual machine may need to be reloaded. For more information see Reloading a vmx file without removing the virtual machine from inventory

Additional Information