Windows Virtual Machines freeze during Incremental Syncs
search cancel

Windows Virtual Machines freeze during Incremental Syncs

book

Article ID: 422065

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

1. Virtual machines freeze during incremental syncs or SYNC NOW operations 

2. VM recovers from a frozen state when you PAUSE replications 

3. Increasing the RPO to the higher side does not fix the problem 

Environment

VMware vSphere 8.0.3, 24674464
VMware vSphere Replication 9.0.2.2, 24628359

Cause

The DemandlogMaxCowSizeMB option introduced a defect where DemandlogFailCollidingUnmap (default: 1) no longer works as intended.

When DemandlogFailCollidingUnmap is set to 1, the hbr filter should report "busy" to the guest OS for SCSI UNMAP commands. However, due to the defect, SCSI UNMAP commands are delayed until overlapping ranges are copied to the demand log, potentially causing the guest OS to freeze.

Resolution

Workaround: Set DemandlogMaxCowSizeMB to 0 to disable it.

Verify the current configuration settings by running these commands - 

esxcli system settings advanced list -o /HBR/DemandlogMaxCowSizeMB
Path: /HBR/DemandlogMaxCowSizeMB
   Type: integer
   Int Value: 4096
   Default Int Value: 4096 (Default value)
   Min Value: 0
   Max Value: 1048576
   String Value:
   Default String Value:
   Valid Characters:
   Description: The summary size of pending COW operations in MB. Reaching this limit can cause further colliding VM IOs to fail.
   Host Specific: false
   Impact: none


esxcli system settings advanced set -o /HBR/DemandlogFailCollidingUnmap  
Path: /HBR/DemandlogFailCollidingUnmap
   Type: integer
   Int Value: 1
   Default Int Value: 1 (Default value)
   Min Value: 0
   Max Value: 5
   String Value:
   Default String Value:
   Valid Characters:
   Description: Fail demand log transaction on WRITE_SAME and UNMAP command collision.
   Host Specific: false
   Impact: none


DemandlogFailCollidingUnmap is set to 1 by default on the ESXi host.
"demandlog_fail_colliding_unmap":1

DemandlogMaxCowSizeMB is set to 4096 MB by default on the ESXi host.
"demandlog_max_cow_size_MB":4096


Modify these settings by running these commands - 

esxcli system settings advanced set -o /HBR/DemandlogMaxCowSizeMB -i 0

esxcli system settings advanced set -o /HBR/DemandlogFailCollidingUnmap -i 1     


Make these changes to all the hosts used for replication. This should fix the freezing issue.

This issue will be fixed in ESX 8.0.3 Patch 7 and 9.0.1 onwards.