Replication sometimes failed due to disk full by unexpectedly increasing replicated data
book
Article ID: 342605
calendar_today
Updated On:
Products
VMware Live RecoveryVMware vSphere ESXi
Issue/Introduction
Symptoms:
Enabling replication for a VM to the same site or cluster, destination datastore ( Replicated datastore ) space grown to maximum and replication failed with an error:
In the hostd.log file, you see entries similar to:
2016-08-31T03:42:06.389Z error hostd[73940B70] [Originator@6876 sub=Hbrsvc] ReplicatedDisk: DiskLib failed to open path /vmfs/volumes/55342262-73e7cf76-####-##########78/VM/VM-01.vmdk(diskID=RDID-0f6caf00-0177-4af0-b46a-282c36461f57) (vmID=4) (groupID=GID-4d485f26-c579-4b06-8d31-85b28e09c0f4): Failed to lock the file. retry open disk, passed retry time=10 seconds ….. 2016-08-31T03:42:41.042Z error hostd[73940B70] [Originator@6876 sub=Hbrsvc] ReplicatedDisk: DiskLib failed to open path /vmfs/volumes/55342262-73e7cf76-####-##########78/VM/VM-01.vmdk(diskID=RDID-0f6caf00-0177-4af0-b46a-282c36461f57) (vmID=4) (groupID=GID-4d485f26-c579-4b06-8d31-85b28e09c0f4): Failed to lock the file. retry open disk, passed retry time=40 seconds 2016-08-31T03:42:41.052Z info hostd[73940B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 1378 : Sync started by VR Scheduler for virtual machine VM on host HOST.example.com in cluster HOST.example.com in ha-datacenter.
2016-01T06:01:25.020Z info hostd[720C1B70] [Originator@6876 sub=Hbrsvc opID=cffd3655 user=System] HbrReconfigureInterceptor checking HBR-enabled config for VM 4 (VM) 2016-09-01T06:01:25.919Z info hostd[737C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/55342262-73e7cf76-####-##########78/VM/VM-01.vmx opID=cffd3655 user=System] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_ON) 2016-09-01T06:01:25.921Z info hostd[737C2B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=cffd3655 user=System] Event 1397 : Reconfigured VM on HOST.example.com in ha-datacenter 2016-09-01T06:01:25.921Z info hostd[737C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/55342262-73e7cf76-####-##########78/VM/VM-01.vmx opID=cffd3655 user=System] Send config update invoked 2016-09-01T06:01:44.778Z error hostd[72BCDB70] [Originator@6876 sub=Hbrsvc opID=ae930a55-d57d-48c5-aeeb-4c8fc0b9722d-HMSINT-18137-29-87-368f user=vpxuser:com.vmware.vcHms] Failed to retrieve replication configuration for VM 4 (VM): replication not enabled 2016-09-19T06:35:58.155Z error hostd[FFA7AAE0] [Originator@6876 sub=Hbrsvc opID=7f6570ad-9af8-4957-9b46-e6f7e6e52da7-HMSINT-30-c2-96-d1aa user=vpxuser:com.vmware.vcHms] Failed to retrieve replication configuration for VM 4 (VM): replication not enabled.
Note: This log excerpt is an example. Date, time, and environmental variables may vary depending on your environment.
From the preceding log entries we see the replication for the VM failed due to lock but when we review the size of the datastore there is no more space to write on that datastore.
This issue occurs when we enable replication for a VM to the same site. This issue has been observed widely when source VM has thin disk.
Replication Datastore going out of space may depend on actual data on the source disk, it may takes few days to weeks to fill the destination datastore.
Changing datastore or datastore type will not help, i.e, local datastore to iscsi or to fiber, will not change the result
Even though source base disk is 20 or 30 GB, hbr-disk gets full on the disk, found unmap command which try to full fill all the disks.
Resolution
To resolve this issue, disable Unmap in the Guest OS by running this command:
DisableDeleteNotify=1
Where:
0 - indicates that the Trim and Unmap feature is on (enabled)
1 - indicates the Trim and Unmap feature is off (disabled)
To work around this issue, stop replication and ensure that the data on remote site is deleted and re-configure replication for the VM.
Additional Information
Impact/Risks: Destination datastore space gets full.