Following virtual machine backup, snapshot deletion takes an unexpectedly a long time to complete on some VMs.
For example, snapshot removal may take 30+ mintues where it normally takes less than 5 mintues.
The issue may be intermittent on a given VM.
Environment
VMware vSphere ESXi (all versions)
Cause
High latency on one or more VM disk causes generation of the bitmap of change blocks to be conslidated to take a long time to complete.
Verification:
During snapshot removal /var/log/vmkernel.log reports repeated warnings against one or more VM disks similar to: vmkwarning: cpu56:26273264)WARNING: SVM: 6024: scsi0:4 VMX took 4170 msecs to send copy bitmap for offset 68719476736. This is greater than expected latency. If this is a vvol disk, check with array latency.
The impacted VM disks have poor I/O performance indicated by one or more of the following:
1) High latency on the VM disks 2) High latency on the LUN/datastore backing the VM disks 3) Queueng of I/O to the LUN/datastore backing the VM disks 4) High %VmWait on VM vmx-vthread processes (which typically indicates that VM related processes frequently wait on completion of I/O.
Resolution
Address any underlying storage latency issues with the assistance of storage and fabric/network vendors.
For VM disks with bitmap warning:
place the disks on their own pvscsi controller
place the disks on a different datastore or, in the case of very I/O intensive VMs, place them on their own datastore
Distribute I/O intensive VMs across datastores to distibute I/O load.