This article helps customers deal with quiesced snapshot issues in Windows Server VMs.
Symptoms:
When taking a quiesced snapshot of Windows Server, snapshot creation fails with this four-sentence message: "An error occurred while quiescing the virtual machine. See the virtual machine's event log for details. An error occurred while taking a snapshot: Failed to quiesce the virtual machine. An error occurred while saving the snapshot: Failed to quiesce the virtual machine."
If VMware Tools VSS logging is enabled (Enabling debug logging for VMware Tools within a guest operating system), the vmware.log shows that some writers fail at freeze stage with VSS_WRITER_STATE VSS_WS_FAILED_AT_FREEZE (9). Here is a sample log:
2022-04-27T03:25:07.393Z In(05) vcpu-0 - Guest: [ debug] [vmvss:vmvss] [6424] VDSHelper::ForEachVDSPack():303: return (0x0)
2022-04-27T03:25:07.394Z In(05) vcpu-1 - Guest: [ debug] [vmvss:vmvss] [6424] VDSHelper::ForEachVolume():337: return (0x0)
2022-04-27T03:25:07.399Z In(05) vcpu-1 - Guest: [ debug] [vmvss:vmvss] [7796] CVmSnapshotRequestor::GatherWriterStatus():2359: enter
2022-04-27T03:25:07.400Z In(05) vcpu-0 - Guest: [ debug] [vmvss:vmvss] [7796] CVmSnapshotRequestor::WaitForOperation():3630: enter
2022-04-27T03:25:07.425Z In(05) vcpu-2 - Guest: [ debug] [vmvss:vmvss] [7796] CVmSnapshotRequestor::CheckWriterStatus():2420: enter
2022-04-27T03:25:07.426Z In(05) vcpu-2 - Guest: [ debug] [vmvss:vmvss] [7796] CVmSnapshotRequestor::CheckWriterStatus():2466: Task Scheduler Writer (1)
2022-04-27T03:25:07.426Z In(05) vcpu-2 - Guest: [ debug] [vmvss:vmvss] [7796] CVmSnapshotRequestor::CheckWriterStatus():2466: VSS Metadata Store Writer (1)
2022-04-27T03:25:07.426Z In(05) vcpu-2 - Guest: [ debug] [vmvss:vmvss] [7796] CVmSnapshotRequestor::CheckWriterStatus():2466: Performance Counters Writer (1)
2022-04-27T03:25:07.427Z In(05) vcpu-2 - Guest: [ warning] [vmvss:vmvss] [7796] CVmSnapshotRequestor::CheckWriterStatus():2454: writer System Writer in failed state: res = 0x800423f2, err = 0x1, error =
2022-04-27T03:25:07.427Z In(05) vcpu-2 - Guest: [ debug] [vmvss:vmvss] [7796] CVmSnapshotRequestor::CheckWriterStatus():2466: System Writer (9)
2022-04-27T03:25:07.428Z In(05) vcpu-2 - Guest: [ debug] [vmvss:vmvss] [7796] CVmSnapshotRequestor::LogComponentError():800: enter
2022-04-27T03:25:07.435Z In(05) vcpu-2 - Guest: [ debug] [vmvss:vmvss] [7796] CVmSnapshotRequestor::LogComponentError():835: failed call: ret = compEx2->GetFailure(&writerErr, &appErr, &appMessage, NULL), result = 0x80070057
2022-04-27T03:25:07.435Z In(05) vcpu-2 - Guest: [ debug] [vmvss:vmvss] [7796] CVmSnapshotRequestor::DoSnapshotSet():2148: failed call: ret = GatherWriterStatus(), result = 0x80042301
If VSS trace is collected during snapshot creation (Using Tracing Tools with VSS), it's found that Windows spends about 50 seconds in THAW_KTM stage.
[ 0:43:17.017 P:0C78 T:1534 REGREGSC(1348) GEN] Event name: THAW_KTM (Enter)
[ 0:44:08.626 P:0348 T:15B8 WRTWRTIC(2279) WRITER] Aborting due to timeout
[ 0:44:08.626 P:0C78 T:15E0 WRTWRTIC(2279) WRITER] Aborting due to timeout
[ 0:44:08.641 P:0C78 T:1580 WRTWRTIC(2279) WRITER] Aborting due to timeout
[ 0:44:08.641 P:06A8 T:14AC WRTWRTIC(2279) WRITER] Aborting due to timeout
[ 0:44:08.641 P:0C78 T:1448 WRTWRTIC(2279) WRITER] Aborting due to timeout
[ 0:44:08.673 P:047C T:164C WRTWRTIC(2279) WRITER] Aborting due to timeout
[ 0:44:08.739 P:0C78 T:1534 REGREGSC(1348) GEN] Event name: THAW_KTM (Leave)
Considering that the total freeze timeout is 60 seconds by default, the long delay in THAW_KTM stage causes some VSS writers to time out and fail at freeze stage.