Virtual machine becomes unresponsive or inactive when taking memory snapshot
search cancel

Virtual machine becomes unresponsive or inactive when taking memory snapshot

book

Article ID: 321376

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
While taking a snapshot of a virtual machine with memory, you may experience these symptoms while the memory is being written to disk:
  • The virtual machine becomes unresponsive or inactive.
  • The virtual machine does not respond to any commands.
  • You cannot ping the virtual machine.


Environment

VMware vSphere ESXi 6.0
VMware vSphere ESXi 6.7
VMware vSphere ESXi 6.5
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Resolution

This is an expected behavior in ESXi. For more information, see Snapshots Take a Long Time When “Keep Memory” is Enabled (76687).

While taking a virtual machine snapshot with memory, the VM may appear to be unresponsive and the snapshot may take a long time to complete. This is because the ESXi host must dump the VM’s memory to disk.

In the vmware.log file, you will notice that during the snapshot creation, a feature called Lazy CheckPointing is utilized. 

Lazy CheckPointing is a feature that allows the VM to continue running while the memory is dumped. It would otherwise have to stop the VM and dump the complete contents of the memory to disk. Instead of completely disrupting operations on the VM, the ESXi host can leave the VM running with degraded performance. 
This Lazy CheckPointing mechanism takes a significant amount of time, and as a result, you experience degraded VM performance for a prolonged period.

Note: Time taken for this operation differs with the amount of memory assigned to the VM. For more information, see the Allocate Memory Resources section of the vSphere Virtual Machine Administration Guide.

For example:

If you take a snapshot of a virtual machine with 4 GB of RAM, you see similar entries in the vmware.log file:

YYY-MM-DDTHH:MM:SS.407Z|svga| I125: MKSScreenShotMgr: Taking a screenshot
YYY-MM-DDTHH:MM:SS.817Z| vmx| I125: VigorTransportProcessClientPayload: opID=kkif5tlx-9467-auto-7b0-h5: 70001354-20-c3-3c42 seq=801: Receiving Snapshot.Take request.
YYY-MM-DDTHH:MM:SS.834Z| vmx| I125: SnapshotVMX TakeSnapshot start: 'VM Snapshot 21252f4\252f2021, 6:38:42 PM', deviceState=1, lazy=1, quiesced=0, forceNative=0, tryNative=1, saveAllocMaps=0
YYY-MM-DDTHH:MM:SS.851Z| vmx| I125: DiskLib_IsVMFSSparseSupported: vmfssparse is not supported on /vmfs/volumes/*****************-****-*************/VM deploy: f532.
YYY-MM-DDTHH:MM:SS.851Z| vmx| I125: DISKLIB-LIB CREATE : DiskLibCreateCreateParam: Selecting the default child type as Sesparse for /vmfs/volumes/*****************-****-*************/VM_deploy/VM_deploy-000001.vmdk.
YYY-MM-DDTHH:MM:SS.851Z| vmx| I125: DISKLIB-LIB CREATE : DiskLibCreateCreateParam: seSparse grain size is set to 8 for '/vmfs/volumes/*****************-****-*************/VM_deploy/VM_deploy-000001.vmdk'
YYY-MM-DDTHH:MM:SS.853Z| vmx| I125: SNAPSHOT: SnapshotPrepareTakeDoneCB: Prepare phase complete (The operation completed successfully).


YYY-MM-DDTHH:MM:SS.411Z| vcpu-0| I125: Checkpoint Unstun: vm stopped for 557792 us
YYY-MM-DDTHH:MM:SS.424Z| vcpu-0| I125: SCSI: switching scsi0 to push completion mode
YYY-MM-DDTHH:MM:SS.424Z| worker-2151272| I125: MainMem: Begin lazy IO (524288 pages, 0 done, 1 threads, bio = 0).
YYY-MM-DDTHH:MM:SS.569Z| vcpu-0| A100: ConfigDB: Setting scsi0:0.redo = ""
YYY-MM-DDTHH:MM:SS.569Z| vcpu-0| I125: DISK: OPEN scs10:0 '/vmfs/volumes/*****************-****-*************/VM_deploy/VM_deploy-000001.vmdk' persistent R[]
YYY-MM-DDTHH:MM:SS.627Z| vcpu-0| I125: DISKLIB-VMFS : "/vmfs/volumes/*****************-****-*************/VM_deploy/VM_deploy-000001-sesparse. vmdk" : open successful (10) size = 25165824, hd = 6728759. Type 19
YYY-MM-DDTHH:MM:SS.627Z| vcpu-0| I125: DISKLIB-DSCPTR: Opened [0]: "VM_deploy-000001-sesparse. vmdk" (Oxa)
YYY-MM-DDTHH:MM:SS.628Z| vcpu-0| I125: DISKLIB-LINK : Opened '/vmfs/volumes/*****************-****-*************/VM_deploy/VM deploy-000001. vmdk' (Oxa) : seSparse, 10485760 sectors / 5 GB.
YYY-MM-DDTHH:MM:SS.658Z| vcpu-0| I125: DISKLIB-VMFS : "/vmfs/volumes/*****************-****-*************/VM_deploy/VM_deploy-flat.vmdk" : open successful (14) size - 5368709120, hd = 4672568. Type 3
YYY-MM-DDTHH:MM:SS.658Z| vcpu-0| I125: DISKLIB-DSCPTR: Opened [0]: "VM_deploy-flat.vmdk" (0xe)
YYY-MM-DDTHH:MM:SS.658Z| vcpu-0| I125: DISKLIB-LINK : Opened '/vmfs/volumes/*****************-****-*************/VM_deploy/VM deploy.vmdk' (Oxe) : vmfs, 10485760 sectors / 5 GB.


YYY-MM-DDTHH:MM:SS.673Z| vcpu-0| I125: SnapshotVMXTakeSnapshotWork: Transition to mode 2.
YYY-MM-DDTHH:MM:SS.673Z| vcpu-0| I125: SnapshotVMXTakeSnapshotWork: Initiated lazy snapshot 'VM Snapshot 2%252f4%252f2021, 6:38:42 PM': 2
YYY-MM-DDTHH:MM:SS.542Z| worker-2151272| I125: MainMem: End lazy IO (524288 done, sync = 0, error = 0).
YYY-MM-DDTHH:MM:SS.554Z| vmx| I125: MainMem: Completed pending lazy checkpoint save (1).
YYY-MM-DDTHH:MM:SS.559Z| vmx| I125: SnapshotVMXTakeSnapshotWork: Transition to mode 1.
YYY-MM-DDTHH:MM:SS.559Z| vmx| I125: SnapshotVMXTakeSnapshotComplete: Done with snapshot 'VM Snapshot 2%252f4%252f2021, 6:38:42 PM': 2
YYY-MM-DDTHH:MM:SS.559Z| vmx| I125: VigorTransport_ServerSendResponse opID=kkif5tlx-9467-auto-7b0-h5: 70001354-20-c3-3c42 seq-801: Completed Snapshot request.


Note: The above example is only for reference, the time taken for the task in this example is on the basis of the  environment and resources. 

During the snapshot process, the virtual machine goes through the Fast Suspend Resume (FSR) process and the guest operating system is unresponsive. The time taken by a virtual machine in the FSR state is dependent on the amount of memory to be written to disk and the frequency at which its changing for such an operation, and the speed and responsiveness of the datastore's backing storage.

When the memory is completely written on the disk, the virtual machine resumes normal operation.