Taking a snapshot of a virtual machine with virtual disk over 2TB on an EMC VMAX array results in corrupted redo logs
search cancel

Taking a snapshot of a virtual machine with virtual disk over 2TB on an EMC VMAX array results in corrupted redo logs

book

Article ID: 306930

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

In virtual machines running on datastores hosted in an EMC VMAX SAN array, you experience these issues:
  • When taking a snapshot of a virtual machine with a virtual disk (vmdk) of 2 TB or greater in size, the redo logs are corrupted.
  • The affected virtual machine displays a message that a virtual disk is corrupted and the virtual machine is powered off.
  • The virtual machine fails to power on after the snapshot is taken.
  • The virtual machine with a corrupted redo log fails to access any data on the corrupted virtual disk.
  • Snapshot consolidation or deletion on the virtual machine fails.

Environment

  • VMware vSphere ESXi 5.5
  • VMware vSphere ESXi 5.1

Cause

The virtual machine redo log contains the metadata information about the virtual machine snapshot. By default, disks larger than 2 TB or linked clones have SE-Sparse type snapshot virtual disks. With and SE-Sparse snapshot virtual disk files, WriteSame operations on the VMAX array may silently fail. When the ESXi 5.5 host detects the corruption, virtual machine power on/power off tasks are restricted.

Resolution

This is a known issue affecting ESXi 5.1 and 5.5.

Note: Before modifying or deleting any virtual machine files, VMware recommends that you create a full backup of the files.

If your environment is at risk and you want to avoid this issue:

  • Temporarily disable VAAI handling on EMC VMAX LUNs. The issue does not appear to occur when VAAI is disabled.

    To disable VAAI for a specific storage type, use the esxcli command to delete the existing hardware acceleration claim rules.

    For more information, see:

To work around this issue, use one of these options:

  • Do not take snapshots of virtual machines with virtual disks larger than 2 TB if the virtual disk is residing on the VMAX datastore.
  • Migrate any virtual disks larger than 2 TB to an alternate SAN array, if possible.
  • Delete the corrupt redo log and manually revert to any older snapshot may allow the virtual machine to power on again.

    Warning: If all redo logs are corrupt, the virtual machine must be recovered from backup.

Additional Information