vSphere Replication: NFC_DISKLIB_ERROR (Connection timed out) due to Storage Latency

Products

VMware Live Recovery

Issue/Introduction

Virtual machine replication status shows Not Active (RPO Violation).
Replication Management Server logs show: Class: NFC Code: 10; NFC error: NFC_DISKLIB_ERROR (Connection timed out).
Destination ESXi host vmkernel.log reports: Fil6 file IO : Timeout.

From Vmware Live Recovery UI, under the replication tab

A replication error occurred at the vSphere Replication Server for replication 'VM_name'. Details: 'Error for <Datastore_UUID>, (diskId: "RDID-<Datastore_UUID>"), (hostId: "host-xxx"), (pathname: "VM_name/hbrdisk.RDID-<Datastore_UUID>.#############.vmdk"), (flags: nfc-error, retriable): Class: NFC Code: 10; NFC error: NFC_DISKLIB_ERROR (Connection timed out); Set error flag: retriable; Set error flag: nfc-error; Can't write (multiEx) to remote disk; Can't write (multi) to remote disk'.

In the /var/log/vmkernel.log of the destination ESXi host, warnings indicate failure to establish connections.

2026-04-27T02:05:25.733Z In(182) vmkernel: cpu64:2103366)Fil3: 390: Caller Fil6_FileIOInt vol LUN_Name' took 80290 ms wantOptlocking: 1,
2026-04-27T02:05:25.733Z In(182) vmkernel: cpu98:2103514)Fil6: 4308: 'LUN_Name': Fil6 file IO (<FD c60 r73>) : Timeout
2026-04-27T02:05:25.733Z In(182) vmkernel: cpu49:2103689)Fil6: 4308: 'LUN_Name'': Fil6 file IO (<FD c60 r73>) : Timeout

In the appliance /var/log/vmware/hbrsrv.log

2026-04-27T21:52:41.582+03:00 verbose hbrsrv[01337] [Originator@6876 sub=HostPicker] AffinityHostPicker choosing host host-11 for context '[] /vmfs/volumes/<Datastore_UUID>/VM_name'
2026-04-27T21:52:41.582+03:00 verbose hbrsrv[01340] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: groupStats, Hbr.Replica.Group.GID-<Datastore_UUID>. Applied change to temp map.
2026-04-27T21:52:41.582+03:00 error hbrsrv[02401] [Originator@6876 sub=Main] HbrError for (datastoreUUID: "<Datastore_UUID>"), (diskId: "RDID-<disk-UUID>"), (hostId: "host-xxx"), (pathname: "VM_name/hbrdisk.RDID-<Datastore_UUID>.############.vmdk"), (flags: nfc-error, retriable) stack:
2026-04-27T21:52:41.582+03:00 error hbrsrv[02401] [Originator@6876 sub=Main]    [0] Class: NFC Code: 10
2026-04-27T21:52:41.582+03:00 error hbrsrv[02401] [Originator@6876 sub=Main]    [1] NFC error: NFC_DISKLIB_ERROR (Connection timed out)
2026-04-27T21:52:41.582+03:00 error hbrsrv[02401] [Originator@6876 sub=Main]    [2] Set error flag: retriable
2026-04-27T21:52:41.582+03:00 error hbrsrv[02401] [Originator@6876 sub=Main]    [3] Set error flag: nfc-error
2026-04-27T21:52:41.582+03:00 error hbrsrv[02401] [Originator@6876 sub=Main]    [4] Can't write (multiEx) to remote disk
2026-04-27T21:52:41.582+03:00 error hbrsrv[02401] [Originator@6876 sub=Main]    [5] Can't write (multi) to remote disk
2026-04-27T21:52:41.582+03:00 error hbrsrv[02401] [Originator@6876 sub=Main]    [6] Converting error to wire failure

Environment

VMware Live Recovery Appliance 9.x

VMware vCenter 8.0

VMware ESXi 8.0

Cause

vSphere Replication utilizes the Network File Copy (NFC) protocol to instruct the target ESXi host to write replicated data directly to the destination datastore.

When the underlying storage takes too long to acknowledge these writes, exceeding ~60 seconds the ESXi host aborts the operation and logs a Fil6 file IO : Timeout

The ESXi host is successfully issuing I/O requests, but the storage backend (SAN/Array) or the physical fabric (FC/iSCSI) is failing to acknowledge or complete these requests within the default SCSI timeout periods.

These storage-level timeouts cascade to the replication layer, causing the NFC_DISKLIB_ERROR and subsequent replication failure.

Resolution

Resolution and Troubleshooting.

1. Analyze Storage Performance: Review SAN/iSCSI array logs for hardware faults or path thrashing.

2. Verify Latency via esxtop:

Run esxtop on the target ESXi host.
Press d (Disk Adapter) or u (Disk Device).
Check if GAVG (Guest Average Latency) consistently exceeds 20-30ms.

3. Reduce Resource Contention: Relocate the vSphere Replication appliance or high-I/O VMs to a less congested LUN or datastore.

4. Validate Fabric Health: Ensure HBA drivers and firmware are up to date and consistent with the interoperability matrix.

5. Engage Storage Vendor: If persistent Fil6 file IO : Timeout events are observed, consult the storage vendor to investigate why the subsystem is failing to acknowledge I/O requests within the expected timeframe.