How to calculate Virtual Machine snapshot consolidation
search cancel

How to calculate Virtual Machine snapshot consolidation

book

Article ID: 316414

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Snapshot consolidation is a common request asking "why the processes take so long to complete?" and the browser tool only provides an estimated percentage bar which can fluctuate reporting percentages. A better way to analysis snapshot consolidation is demonstrated below.


Perform a snapshot deletion prior a snapshot consolidation. When deleting the snapshots, all changes merge with the base virtual machine disk. Snapshot consolidation is useful when snapshot disks fail to compact after a Delete or Delete all operation or if the disk did not consolidate. This might happen, for example, if you delete a snapshot but its associated disks do not commit back to the base disk.

 

Symptoms:
The virtual machine's Summary tab shows Needs Consolidation.

Cause

To determine the total of all the VM snapshots in the VM directory, change directory to the virtual machine for snapshot consolidation run the command:

 

cd /vmfs/volumes/datastore/vm/
ls -lah

 

Virtual Machine directories with no sesparse files run this command:


ls -la |grep delta |awk '{print $5}' | awk '{ SUM += $1/1024/1024} END {print "total of all snapshots " SUM "MB\n"}'
ls -la |grep delta |awk '{print $5}' | awk '{ SUM += $1/1024/1024/1024} END {print "total of all snapshots " SUM "GB\n"}'

 

or

 

VM directories with sesparse files run this command:


ls -la |egrep -i 'delta|sparse' |awk '{print $5}' | awk '{ SUM += $1/1024/1024} END {print "total of all snapshots " SUM "MB\n"}'
ls -la |egrep -i 'delta|sparse' |awk '{print $5}' | awk '{ SUM += $1/1024/1024/1024} END {print "total of all snapshots " SUM "GB\n"}'

 

Justification for snapshot deletion or disk consolidation is usually back-ups or disk space.

 

The error message for a disk space issue will report in the web browser as a pop-up window

  • "An error occurred while consolidating disks"

 

To consolidate the snapshots the disk space available must be 1.5 times the total of all snapshots in that directory.

99% Used will require array based storage cleanup, remove files from the datastore or add more physical storage to the datastore before a deletion, remove or consolidation can be performed.
 

Resolution

To calculate approximately an average speed between 2 MB/sec and 4MB/s on consolidated VM's after shutdown gracefully. VM's do not require to be powered-off to consolidate, it is recommend to power-off database VM's when consolidating or cloning VM's to preserve data integrity.
 
The IO read write speed is 2MB/s upto 4MB/s, so we can estimate the time to complete.
 
Example: If the total delta and sesparse disks are 2TB. The break down is MB per second. Then determine the hours of data transfer rate:

data transfer rate is approximately (2TB) / (2 (MB /s)) = 11.57 days
data transfer rate is approximately (2TB) / (3 (MB / s)) = 7.71 days
data transfer rate is approximately (2TB) / (4 (MB / s)) = 5.78 days
data transfer rate is approximately (200GB) / (2 (MB / s)) = 1.15 days
data transfer rate is approximately (200GB) / (3 (MB / s)) = 18.51 hours
data transfer rate is approximately (200GB) / (4 (MB / s)) = 13.88 hours

Why snapshots are estimated and no specific time for completion can be provided?
 
Actual IOPS for a given disk requires the drive manufactures Average Latency and Average seek time. This is why snapshots are estimated and no specific time for completion of creation, deletion, removal, and consolidation can be provided.
 
The 4K Random IOPS on the other hand is how many 4K (4096 byte) operations the drive will handle per second with each block being read or written to a random position.
 
maximum IOPS (performance) for a given disk:
10K RPM Fibre Channel Disk: 130 IOPS
15K RPM Fibre Channel Disk: 180 IOPS
 
To calculate the actual IOPS for a given disk, the following information is required:
  • Average latency
  • Average seek time
Assume that we have a Seagate ST3146807FCV Cheetah 146GB 10K RPM Fibre Channel hard disk. It is rated as follows:
 
Average latency (avgLatency): 2.99 ms or .00299 seconds
Average seek time (avgSeek): 4.7 ms or .0047 seconds
 
To calculate this disk's IOPS, use the following equation:
 
IOPS = 1/(avgLatency + avgSeek)
 
The next layer of IOPS not calculated is the protocols (iscsi, FCoE, Ethernet, fiber). Time to complete is estimated due to the bandwidth, latency and manufacture of the hardware.
 
There was a time stamp that allows a calculation to be done, using the command on a ESXi host:
 
/vmfs/volumes # find */VM_name/ -type f -exec ls -lath {} \; |grep -E "delta|sparse"
 
-rw------- 1 root root 167.6G Oct 8 04:52 datastore/VM_name/VM_name-000001-delta.vmdk
 
-rw------- 1 root root 8.0G Oct 8 16:10 datastore/VM_name/VM_name-000001-sesparse.vmdk
 
Between the two timestamps the speed can be determined:
 
The difference between timestamp Oct 8 04:52 and Oct 8 16:10 is 11 hours and 22 minutes.
 
The file VM_name-000001-delta.vmdk, 167.6G subtract 8G reduced in size by 159.6GB
 
A rate of approximately 2MB/s and 4MB/s.
 
Convert MB/s to GB/h :
 
Divide the change 159.6GB by the GB/h this gives approximately how many GB consolidated per hour, usually a range of 2MB/s to 4MB/s.
 
The VMDKs greater than 2TB in size, SEsparse becomes the default scheme for virtual disk snapshots. This was introduced in ESXi 5.5.
 
Other options to observe snapshot consolidation:
 

cd /vmfs/volumes/datastore/vm/

watch -d 'ls -lth | grep -E "delta|flat|sesparse"'
 
Keyboard combination to quit is Ctrl c
 
The different snapshot tasks will report consolidate, remove, delete.
 

vim-cmd vmsvc/getallvms |grep VM_name

vim-cmd vimsvc/task_list vmid

ManagedObjectReference) [

'vim.Task:haTask-xxxxxx-vim.vm.Snapshot.remove-xxxxxxx'

]

vim-cmd vimsvc/task_info haTask-xxx-vim.vm.Snapshot.remove-xxxxxxx

Look for the line containing ' progress = __ '
 
This output provides the same information that the progress bar in the web browser (GUI) reports

Output: The number 31 is the virtual machine id (vmid)
 
vim-cmd vimsvc/task_info haTask-31-vim.vm.Snapshot.remove-xxxxxxx

(vim.TaskInfo) {

dynamicType = <unset>,

key = "haTask-31-vim.vm.Snapshot.remove-xxxxxxx",

task = 'vim.Task:haTask-31-vim.vm.Snapshot.remove-xxxxxxx',

description = (vmodl.LocalizableMessage) null,

name = "vim.vm.Snapshot.remove",

descriptionId = "vm.Snapshot.remove",

entity = 'vim.VirtualMachine:31',

entityName = "VM_name",

state = "running",

cancelled = false,

cancelable = true,

error = (vmodl.MethodFault) null,

result = <unset>,

progress = 99,

reason = (vim.TaskReasonUser) {

dynamicType = <unset>,

userName = "vpxuser",

},

queueTime = "2016-12-30T14:23:32.79546Z",

startTime = "2016-12-30T14:23:32.795704Z",

completeTime = <unset>,

eventChainId = xxxxxxx,

changeTag = <unset>,

parentTaskKey = <unset>,

rootTaskKey = <unset>,

}
 
In the vSphere Client web browser or vCenter Thick client, Hosts and Clusters or VM's and Templates:
 
Right click the VM_name, Edit Settings.
 
Select a Hard Drive #
 

The datastore is reported as a file path to the VM_name.vmdk (Example: [datastore_name] VM_directory/VM_name.vmdk)

The web browser (GUI) can isolate the performance of a specific virtual machine (VM):
 
Go to the ESXi host and then navigate to Monitor >> Performance >> Advanced.

Click Chart options.

Click none and then select only the [datastore_name] associated with VM (Example: VM_name) >> and click read rate and click write rate, click done.
 
The performance chart will then report output of the IO for that VM.


Additional Information

For mor information, see: 



Impact/Risks:

Consolidating snapshots for a virtual machine
 
Note:
  • The remove snapshot process can take a long time to complete if the snapshots are large.
  • Do not interrupt Snapshot consolidation. Interrupting the process can irretrievably corrupt the vDisk(s) being consolidated.
  • Virtual machine performance may be degraded during the snapshot consolidation process.