[VMware Live Cyber Recovery] - Leveraging 3rd Party Backup w/VSS on VSAN ESA -based Storage, Followed by TRIM/UNMAP on VMC on AWS SDDC, Can Cause Unusable VLCR High-Frequency Snapshots
search cancel

[VMware Live Cyber Recovery] - Leveraging 3rd Party Backup w/VSS on VSAN ESA -based Storage, Followed by TRIM/UNMAP on VMC on AWS SDDC, Can Cause Unusable VLCR High-Frequency Snapshots

book

Article ID: 437099

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Risk of High-Frequency (LWD) Snapshot corruption, when the following workflow occurs in VLCR - VMware Live Cyber Recovery environment: 
1. VSS Quiescing + 3rd party (Commvault, or other VSS vendor) snapshot is taken,
2. The same snapshot resides on VSAN ESA (VMC -based storage),
3. TRIM on the compute cluster occurs which is backed by VSAN ESA,
4. Corruption of VLCR LWD snapshot

Environment

  • VMware Live Cyber Recovery 9.0.0.11,
  • VMware Cloud Foundation 5.x,
  • VMC on AWS SDDC Release 1.26,
  • ESXi server 8.0u3 or below,
  • Leveraging any VSS 3rd party backup (Commvault, other VSS writers vendors)

Cause

VSS is used in conjunction with ESA writable snapshots (also called guest file system quiesced snapshots) to make the guest application and NTFS/ReFS file system data consistent in case we need to restore the snapshot. After the snapshot is taken, a small amount of metadata writes can happen, and applications (either system processes or userspace apps like MSSQL) have an opportunity to flush their transaction logs to disk and unmap them to speed up recovery. In this bug, the unmap can be misdirected and affect the running point. This means in practice transaction logs may not provide the proper failover guarantees for the application if a failover occurs immediately after the VSS snapshot. If such a failover does not occur shortly after the VSS snapshot, the transaction log is likely to be truncated by the application running in the running point after which point the misdirected unmap will no longer affect recovery.

In vSAN ESA, it is always on and cannot be disabled due to how the underlying file system is designed:

  • Log-Structured File System (LFS): ESA uses a highly efficient, log-structured file system. For an LFS to perform optimally, it needs to know exactly which blocks are in use and which contain deleted data.
  • Garbage Collection Efficiency: When a Guest OS deletes a file, issuing a TRIM/UNMAP command tells vSAN that those specific blocks are no longer needed. vSAN ESA can then immediately reclaim this capacity and skip moving those "dead" blocks during its internal garbage collection processes, drastically reducing write amplification.
  • Performance and Capacity: Because ESA is optimized for modern NVMe drives, the overhead of processing UNMAP commands is practically zero. Forcing it on ensures the cluster maintains maximum capacity efficiency and performance over its lifespan.


For this known caveat, the configuration options available for backup administrator(s), is to disable TRIM/UNMAP for those individual VSS-protected VMs rather than performing it on Cluster level within Guest OS. Please refer to "Additional Information" section for specific Guest-OS TRIM/UNMAP disabling instruction set.

 

Resolution

Since the compute cluster in VMC on AWS is leveraging vSAN ESA, TRIM/UNMAP is always enabled, alternate options includes

A.) Disable TRIM/UNMAP at the guest OS level, following these commands in Windows Gues -OS:

To verify:
fsutil behavior query disabledeletenotify

To set: 
fsutil behavior set DisableDeleteNotify NTFS 1
fsutil behavior set DisableDeleteNotify ReFS 1


OR

B.) Temporarily disable VSS w/3rd-party backup (such as Commvault, or another VSS-enabled backup vendor),

Note: Either of these options selected are sufficient, until future SDDC release is available with longer-term code fix (1.26v3),

Additional Information

Workaround to Disable Guest OS UNMAP in Windows / Other Operating Systems:

https://knowledge.broadcom.com/external/article?articleNumber=323572

 

Public References : 
https://infohub.delltechnologies.com/en-us/l/dell-powerstore-microsoft-hyper-v-best-practices/trim-unmap-and-disk-space-recovery-2/