During a snapshot operation, virtual machine becomes unresponsive with one VCPU at 100% SWPWT
search cancel

During a snapshot operation, virtual machine becomes unresponsive with one VCPU at 100% SWPWT

book

Article ID: 321014

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

During a snapshot operation, you experience these symptoms:

  • The virtual machine becomes unresponsive
  • Looking at the esxtop information of the ESXi host, you can see that the virtual machine is running with one VCPU at 100% SWPWT


Environment

VMware vSphere ESXi 5.5

Cause

The vmm call to the VMkernel is stuck on the VMkernel side due to a race condition.

Resolution

This is a known issue affecting ESXi 5.5.
 
This is resolved in ESXi 5.5 Patch 4.

For additional troubleshooting, you need to identify the issue correctly.

To troubleshoot this issue, run these commands:

esxtop command (Part 1)
  1. While the virtual machine is unresponsive, run esxtop from an SSH session to the ESXi host in which the affected virtual machine is registered.
  2. Find the virtual machine and expand its group. Press e to enter world ID.
  3. You must see all vpcu worlds with 100% VMWAIT and one of those vcpu worlds with 100% SWPWT (or very close to it, such as 99%):

For example:

ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP %CSTP %MLMTD %SWPWT
1171868 2220151 vmx 1 0.01 0.00 0.00 100.00 - 0.00 0.00 0.00 0.00 0.00 0.00
1171870 2220151 vmast.1171869 1 0.07 0.06 0.00 100.00 - 0.00 0.00 0.00 0.00 0.00 0.00
1171874 2220151 vmx-vthread-7:A 1 0.00 0.00 0.00 100.00 - 0.00 0.00 0.00 0.00 0.00 0.00
1171875 2220151 vmx-vthread-8:A 1 0.00 0.00 0.00 100.00 - 0.00 0.00 0.00 0.00 0.00 0.00
1171876 2220151 vmx-vthread-9:A 1 0.00 0.00 0.00 100.00 - 0.00 0.00 0.00 0.00 0.00 0.00
1171877 2220151 vmx-mks:AGPRODS 1 0.01 0.01 0.00 100.00 - 0.00 0.00 0.00 0.00 0.00 0.00
1171880 2220151 vmx-svga:AGPROD 1 0.00 0.00 0.00 100.00 - 0.00 0.00 0.00 0.00 0.00 0.00
1171881 2220151 vmx-vcpu-0:AGPR 1 0.00 0.00 0.00 100.00 100.00 0.00 0.00 0.00 0.00 0.00 100.00
1171882 2220151 vmx-vcpu-1:AGPR 1 0.00 0.00 0.00 100.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00
1171883 2220151 vmx-vcpu-2:AGPR 1 0.00 0.00 0.00 100.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00
1171884 2220151 vmx-vcpu-3:AGPR 1 0.00 0.00 0.00 100.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00


 

ps command (Part 2)

  1. Run this command:

    ps -s | grep <vm_name>
     
  2. In the output, you should see one vmm world in the WAIT SWPC state and the other(s) in the WAIT SEMA state.

    For example:

    36350 vmm0:HRWeb 36349 0 V WAIT SEMA
    36352 vmm1:HRWeb 36349 0 V WAIT SWPC

Note: The world ID of the WAIT SWPC world in Part 2 should agree with the 100% SWPWT world ID in Part 1. If the virtual machine is in this state for an extended period of time, you are likely hitting this issue.

To work around the issue, halt the virtual machine process to recover it.

For more information, see Powering off an unresponsive virtual machine on an ESX host (1004340).

 

Additional Information

To be alerted when this document is updated, click the Subscribe to Article link in the Actions box

 

Using esxtop to identify storage performance issues for ESX / ESXi (multiple versions)
Gathering esxtop performance data at specific times using crontab