vSphere replication fails with NFC_SESSION_ERROR
search cancel

vSphere replication fails with NFC_SESSION_ERROR

book

Article ID: 338569

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • When configuring vSphere replication fails with this error:
 " A replication error occurred at the vSphere replication server for replication "vm_name" Details Error for ( datastore UUID: #######: Class NFC code: 8 NFC error: NFC_session_error)
  • All the ports were found to be open.
  • In the vmkernel.log file in the source host, you see entries similar to: /var/run/log/vmkernel.log
2017-10-13T17:21:21.708Z cpu13:3810676)Hbr: 2208: Wire compression supported by server x.x.x.x: FastLZ
2017-10-13T17:21:27.774Z cpu14:3810676)Hbr: 3002: Command: INIT_SESSION: error result=Failed gen=-1: Error for (datastoreUUID: "#####-####-###-#####"), (diskId: "#####-####-####-####-####-#####"), (hostId: "host-x"), (pathname: $
2017-10-13T17:21:27.774Z cpu14:3810676)WARNING: Hbr: 3011: Command INIT_SESSION failed (result=Failed) (isFatal=FALSE) (Id=0) (GroupID=GID-c07217a9-dc9a-4086-b986-f1a9ba146dc0)
2017-10-13T17:21:27.774Z cpu14:3810676)WARNING: Hbr: 4573: Failed to establish connection to [x.x.x.x]:31031(groupID=GID-c07217a9-dc9a-4086-b986-f1a9ba146dc0): Failure
2017-10-13T17:21:43.510Z cpu1:216258)WARNING: elxnet: elxnet_mgmtGetAdapter:92: 0000:09:00.0: Failed to find node in vmkDevice table status: 0xbad0003
  • In the hbrsrv.log file of VR appliance at DR site, you see entries similar to:  /var/log/vmware/hbrsrv.log 
2017-10-13T17:30:50.271Z verbose hbrsrv[7FF282A80700] [Originator@6876 sub=HostPicker] AffinityHostPicker forgetting host affinity for context '[] /vmfs/volumes/#######-######-####-#####/vm_name'
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] HbrError for (datastoreUUID: "######-######-####-######"), (hostId: "host-9"), (pathname: "vm_name/vm.vmdk"), (flags: retriable, pick-new-host) stack:
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [0] Class: NFC Code: 8
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [1] NFC error: NFC_SESSION_ERROR
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [2] Code set to: Host unable to process request.
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [3] Set error flag: retriable
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [4] Set error flag: pick-new-host
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [5] Can't open remote disk /vmfs/volumes/#####-#####-####-####/vm_name/vm.vmdk
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [6] Probing disk capacity.
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [7] Attempt 2 of 4, will retry after 50 ms.
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [8] Ignored error.
2017-10-13T17:30:50.322Z info hbrsrv[7FF282A80700] [Originator@6876 sub=StorageManager] Running destructor for NFC connection to host-9.
2017-10-13T17:30:50.322Z info hbrsrv[7FF282A80700] [Originator@6876 sub=StorageManager] Destroying NFC connection to host-9.
2017-10-13T17:30:50.322Z verbose hbrsrv[7FF282A80700] [Originator@6876 sub=HostPicker] AffinityHostPicker choosing host host-9 for context '[] /vmfs/volumes/#####-#####-####-#####/vm_name'

 

  • In the hostd.log file on target Esxi host, you see entries similar to: /var/run/log/hostd.log
2017-10-13T18:16:10.603Z info hostd[411A3B70] [Originator@6876 sub=Nfcsvc] Plugin started
2017-10-13T18:16:10.711Z error hostd[411A3B70] [Originator@6876 sub=Nfcsvc] Read error from the nfcLib: NFC_NO_MEMORY (done=yep)
2017-10-13T18:16:10.903Z info hostd[42182B70] [Originator@6876 sub=Nfcsvc] PROXY connection to NFC(useSSL=0): found session ticket:[N9VimShared15NfcSystemTicketE:0x1f481f24]
2017-10-13T18:16:10.903Z info hostd[42182B70] [Originator@6876 sub=Nfcsvc] Successfully initialized nfc callback for a write to the socket to be invoked on a separate thread
2017-10-13T18:16:10.903Z info hostd[42182B70] [Originator@6876 sub=Nfcsvc] Plugin started
2017-10-13T18:16:22.851Z error hostd[40E40B70] [Originator@6876 sub=Nfcsvc] Read error from the nfcLib: NFC_NO_MEMORY (done=yep)
2017-10-13T18:16:23.151Z error hostd[42182B70] [Originator@6876 sub=Nfcsvc] Read error from the nfcLib: NFC_NO_MEMORY (done=yep)
2017-10-13T18:16:23.453Z error hostd[410AEB70] [Originator@6876 sub=Nfcsvc] Read error from the nfcLib: NFC_NO_MEMORY (done=yep)


Note: This log excerpt is an example. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware vSphere Replication 6.5.x
VMware vSphere Replication 8.x 

Cause

This issue is encountered due to the host experiencing NFC memory shortage, caused by an excessive number of retries or uncleared stale sessions.

Resolution

To resolve this issue, restart the management services on the target host and see if we hit these errors.

In case the above steps does not resolve the issue, increase the session memory for nfc by changing the parameters in etc/vmware/hostd/config.xml nfcsvc config section and restart the hostd services.

          <nfcsvc>
              <path>libnfcsvc.so</path>
              <enabled>true</enabled>
              <maxMemory>50331648</maxMemory> <================= increase this to a larger value say 60*1024*1024
              <maxStreamMemory>10485760</maxStreamMemory>
           </nfcsvc>


For  ESXi 7.0 U2, the service configuration settings are now stored in a dedicated configuration store database accessible by using /bin/configstorecli.

Below are the steps to modify the NFC settings:

1. Import the configuration to a temporary JSON file:
  $ /bin/configstorecli config current get -c esx -g services -k hostd -outfile tmp.json

2. Edit the file:
  $ vi tmp.json

3. By default, maxMemory parameter is set to value 100663296. Set it to 150663296 (set the value as per the environmental requirement) :
 
<nfcsvc>
        <path>libnfcsvc.so</path>
        <enabled>true</enabled>
        <maxMemory>100663296</maxMemory>           <================= increased this to 150663296
        <maxStreamMemory>35651584</maxStreamMemory>
  </nfcsvc>

4. Save the changes : Hit Esc -> :wq!

5. Apply the file to the database:
  $ /bin/configstorecli config current set -c esx -g services -k hostd -infile tmp.json

6. Restart hostd service:
  $ /etc/init.d/hostd restart


Note
: In case of multiple host, decode the host name from the vCenter Sever mob page of DR site as per the host name in HBR logs.

Additional Information

NFC operations running out of memory indicate that there are concurrent NFC sessions requests coming in for the ESXi host. It is advised to reduce the number of NFC sessions getting created. For instance, where the NFC operations are the VM replication requests, the number of VM replications can be reduced.