Error: The parent virtual disk has been modified since the child was created while performing recovery with vLSR and vSphere Replication

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction

Recovery fails during storage pre-synchronize storage
Error in the DR UI for the failure:

VR synchronization failed for VRM group xxxx_xxxx. A replication error occurred at the vSphere Replication Server for replication 'xxxx_xxxx'. Details: 'Error for (hostIP: "10.xxx.xx.xx"), (flags: retriable): Fault: (vmodl.fault.SystemError)
faultCause = (vmodl.MethodFault) null,
faultMessage = <unset>,
reason = "The parent virtual disk has been modified since the child was created. The content ID of the parent virtual disk does not match the corresponding parent content ID in the child"
msg = "A general system error occurred: The parent virtual disk has been modified since the child was created. The content ID of the parent virtual disk does not match the corresponding parent content ID in the child"
; Set error flag: retriable; Failed to combine files ([] /vmfs/volumes/657c682b-xxxx-xxxx/xxxx_xxxx/hbrdisk.RDID-xxxx-9f28bfd72057.13098426.yyyy.vmdk:[] /vmfs/volumes/657c682b-e3651dec-xxxx-xxxx/xxxx_xxxx/xxxx_xxxx.vmdk); combine links on host-16159; Tried operation 4 times, giving up.; Pruning instance of diskID RDID-xxxx-9f28bfd72057 (prunePhase=0); Prune disks could not remove disk instance (instanceKey=3168359) (DiskID=RDID-xxxx-9f28bfd72057); While completing partial prune'.

Environment

VMware Live Site Recovery 9.x
VMware Site Recovery Manager 8.x
vSphere Replication 8.x
vSphere Replication 9.x

Cause

The issue is caused when a RDID file is not found during a consolidation or prune operation after a replication instance before the actual recovery is run:

2025-07-05T15:09:12.924Z verbose hbrsrv[66472] [Originator@6876 sub=HostPicker groupID=GID-fa3600ef-fb34-xxxx opID=hsl-62ceefe5] AffinityHostPicker choosing host host-164347 for context '[] /vmfs/volumes/657c682b-xxxx/xxxx_xxxx'
Failed to open'/vmfs/volumes/657c682b-e3651dec-xxxx/xxxx_xxxx/hbrdisk.RDID-88137ed8-xxxx-9a277e9607bb.13098429.236165899823424.vmdk': No file exists for given path (NFC_FILE_MISSING)
2025-07-05T15:10:01.322Z info hbrsrv[66472] [Originator@6876 sub=Main groupID=GID-fa3600ef-fb34-xxxx opID=hsl-62ceefe5] [0] Class: NFC Code: 16
2025-07-05T15:10:01.322Z info hbrsrv[66472] [Originator@6876 sub=Main groupID=GID-fa3600ef-fb34-xxxx opID=hsl-62ceefe5] [1] NFC error: NFC_FILE_MISSING
2025-07-05T15:10:01.322Z info hbrsrv[66472] [Originator@6876 sub=Main groupID=GID-fa3600ef-fb34-xxxx opID=hsl-62ceefe5] [2] Code set to: Storage is not found.
2025-07-05T15:10:01.322Z info hbrsrv[66472] [Originator@6876 sub=Main groupID=GID-fa3600ef-fb34-xxxx opID=hsl-62ceefe5] [8] Unable to open new top-level redo log for instance RDID-88137ed8-xxxx-9a277e9607bb.

The RDID file is not found due to a All Paths Down condition with the backend array when the consolidation task is running on the target ESXi host
/var/log/vobd.log on the target ESXi host has the below entries:

2025-07-05T15:07:41.384Z In(14) vobd[2097955]: [APDCorrelator] 964005978934us: [vob.storage.apd.start] Device or filesystem with identifier [naa.600a0980xxxxxx] has entered the All Paths Down state.
2025-07-05T15:07:41.385Z In(14) vobd[2097955]: [APDCorrelator] 964017607575us: [esx.problem.storage.apd.start] Device or filesystem with identifier [naa.600a0980xxxxxx] has entered the All Paths Down state.

/var/log/vmkernel.log on the target ESXi host has the below entries:

2025-07-05T15:10:52.033Z In(182) vmkernel: cpu24:7382280)Vol3: 4142: Could not open device 'naa.600a0980xxxxxx:1' for probing: No connection
2025-07-05T15:10:52.033Z In(182) vmkernel: cpu24:7382280)Vol3: 4142: Could not open device 'naa.600a0980xxxxxx:1' for probing: No connection

This APD causes a CID, PID mismatch between the parent and the child disk created for the replication instance

2025-07-05T15:25:51.035Z error hbrsrv[27919] [Originator@6876 sub=Main groupID=GID-fa3600ef-xxxx opID=hsl-62cf38ef] HbrError for (hostIP: "10.177.21.240"), (flags: retriable) stack:
2025-07-05T15:25:51.035Z error hbrsrv[27919] [Originator@6876 sub=Main groupID=GID-fa3600ef-xxxx opID=hsl-62cf38ef] [0] Fault: (vmodl.fault.SystemError) {
--> faultCause = (vmodl.MethodFault) null,
--> faultMessage = <unset>,
--> reason = "The parent virtual disk has been modified since the child was created. The content ID of the parent virtual disk does not match the corresponding parent content ID in the child"
--> msg = "A general system error occurred: The parent virtual disk has been modified since the child was created. The content ID of the parent virtual disk does not match the corresponding parent content ID in the child"

During the time of the recovery the VM had already run into a condition where disk consolidation tasks for its replication instances had failed
As expected, when another replication instance was run during recovery, the failure continues

Resolution

Contact storage/fabric vendor to understand the root cause of the APD
To workaround the issue, if the storage device is back on line, re-configure replication of the VM with seeds
If the storage device is still offline, stop replication gracefully and re-configure replication for the VM