Rubrik VM backup failing with NFC_COMPRESSION_ERROR
search cancel

Rubrik VM backup failing with NFC_COMPRESSION_ERROR

book

Article ID: 393216

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Rubrik VM backup solution is failing to backup VMs

Environment

ESXi 7.x and above

Rubrik Back up Solution.

Cause

Rubrik VM backup solution is failing to backup VMs, on different hosts.  Rubrik support is finding an NFC_COMPRESSION_ERROR on their side.

Rubrik Logs showing:


VM W 0:11:32 2025-03-31 19:37:00.746 0:00:00 CREATE_VMWARE_SNAPSHOT_  :::0 [fetch snapshot] [NFC ERROR]NfcAio_TimedWait: The session is in a faulted state: NFC_COMPRESSION_ERROR
VM I 0:11:32 2025-03-31 19:37:00.746 0:00:00 CREATE_VMWARE_SNAPSHOT_  :::0 [fetch snapshot] VixDiskLib: VixDiskLib_Close: Close disk.
VM W 0:11:32 2025-03-31 19:37:00.746 0:00:00 CREATE_VMWARE_SNAPSHOT_ :::0 [fetch snapshot] [NFC ERROR]NfcAio_TimedWait: The session is in a faulted state: NFC_COMPRESSION_ERROR
VM W 0:11:32 2025-03-31 19:37:00.746 0:00:00 CREATE_VMWARE_SNAPSHOT_ :::0 [fetch snapshot] [NFC ERROR]NfcAio_DDBGet: The session is in a faulted state: NFC_COMPRESSION_ERROR

Hostd Log:


warning hostd[2100694] [Originator@6876 sub=Libs opID=nbdmode-0000009b67eb11a0] [NFC ERROR]NfcAioGetMessage: Srv invalid msg hdr magic # 4, was expecting -1593779590
warning hostd[2100694] [Originator@6876 sub=Libs opID=nbdmode-0000009b67eb11a0] [NFC ERROR]NfcAioLogFatalSessionErrorLocked: A fatal session error occurred. The error was: 'NFC_SESSION_ERROR' (8)
warning hostd[2100694] [Originator@6876 sub=Libs opID=nbdmode-0000009b67eb11a0] [NFC ERROR]NfcAioGetAndProcessMsg: Failed to receive an AIO message: NFC_SESSION_ERROR
warning hostd[2100694] [Originator@6876 sub=Libs opID=nbdmode-0000009b67eb11a0] [NFC ERROR]NfcAioServerProcessMain: Fatal session error. Cleaning up AIO session
error hostd[2100694] [Originator@6876 sub=Nfcsvc opID=nbdmode-0000009b67eb11a0] Read error from the nfcLib: NFC_SESSION_ERROR (done = yep)

- Each host was showing a large number of read error on  vmhba1

- Bad SFP in the MDS switch prevented communication with storage array.

Resolution

Engage switch vendor.

Additional Information

Common Causes of NFC Compression Errors

1. Connectivity Issues:

  • DNS Resolution Problems: The Veeam proxy or backup server may be unable to resolve the IP address of the ESXi host.
  • Firewall/Port Issues: Port 902, used for NFC, might be blocked by a firewall or network device.
  • General Network Connectivity Problems: Network issues between the vCenter and ESXi hosts can disrupt NFC communication.

2. Permissions Issues:

  • Insufficient Permissions: The account used for the Veeam backup infrastructure might not have the required permissions to access virtual machines or datastores.

3. File Locks:

  • Locked Files: The file that Veeam is attempting to read or write may be locked by another process or VM within the vSphere environment.

4. NFC Memory Limits:

  • Host Memory Exhaustion: The ESXi host may be running low on memory for NFC sessions, often due to excessive retries or stale sessions.

5. VDDK Issues:

  • VDDK Crashes: In certain versions of vSphere, VDDK (Virtual Disk Development Kit) crashes may occur after encountering NBD (Network Block Device) asynchronous I/O (AIO) errors, which lead to NFC issues.

6. NBD Transport Issues:

  • Network Buffer Size: In older versions of vSphere, larger buffer sizes on the VDDK side could result in increased memory consumption on the NFC server side, causing errors.

7. VM Configuration Errors:

  • Missing Parent VM Configuration: If the VM configuration doesn’t specify the parent VM or parentVApp, backups using NBD transport might fail.

 

Troubleshooting and Solutions

1. Check Connectivity:

  • Ping/Resolve: Ensure that the Veeam proxy or backup server can successfully ping and resolve the IP address of the ESXi host.
  • Firewall/Port Check: Verify that port 902 is not blocked by firewalls or network devices, as this port is critical for NFC communication.

2. Review Permissions:

  • Account Permissions: Confirm that the account used in Veeam for vCenter has appropriate permissions to access virtual machines and datastores.
  • Grant Necessary Permissions: If the permissions are insufficient, ensure that the backup user has at least Read-Only access to the relevant objects in vCenter.

3. Resolve File Locks:

  • Identify Locked Files: Check if any files are locked within the vSphere environment that may be preventing the backup process. If so, unlock these files.
  • Monitor VM Processes: Ensure that no other processes, such as another backup or VM task, are locking the virtual machine's files.

4. Increase NFC Memory:

  • Modify Memory Limits: If NFC memory exhaustion is the cause, increase the memory available to NFC sessions. Edit the config.xml file under the nfcsvc section:
    • Path: /etc/vmware/hostd/config.xml
    • Example:

      xml
      CopyEdit
      <maxMemory>60*1024*1024</maxMemory>
  • Restart Services: After modifying the configuration file, restart the hostd service to apply the changes.

5. Restart Management Services:

  • Restart the management services on the ESXi host to ensure that the NFC service is functioning properly.
  • Use the following command to restart services:

    bash
    CopyEdit
    /etc/init.d/hostd restart

6. Reboot Hosts/vCenter:

  • Reboot ESXi Hosts: If the error persists after troubleshooting, consider rebooting the ESXi host to reset any stale NFC sessions.
  • Reboot vCenter: Rebooting vCenter can also help if the issue appears to be related to vCenter connectivity or service state.

7. Check VM Configuration:

  • Verify Parent VM: Ensure that the VM’s configuration in vCenter is correctly linked to the parent VM or parentVApp, especially if you are using NBD transport.
  • Move VM Out of vCLS Folder: If the VM is located in a vCLS folder, try moving it to a standard folder in vCenter. Backup of VMs in a vCLS folder may fail by default.

8. Check for VDDK Issues:

  • VDDK Version: Ensure that the Virtual Disk Development Kit (VDDK) is compatible with your version of vSphere and that the latest patches are installed.
  • Update VDDK Drivers: If necessary, update the VDDK drivers on the backup server.

9. Monitor Network Performance:

  • Network Errors: Monitor the network performance for errors, packet drops, or slowdowns that could impact the NFC protocol during backups.

 

 

 Troubleshooting NFS datastore connectivity issues