vSphere replication fails with NFC_SESSION_ERROR
search cancel

vSphere replication fails with NFC_SESSION_ERROR

book

Article ID: 338569

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • When configuring vSPhere replication fails with this error:
 " A replication error occurred at the vSphere replication server for replication "test" Details Error for ( datastore UUID: 55xxxxxx3ab: Class NFC code: 8 NFC error: NFC_session_error
  • All the ports were found to be open.
  • In the vmkernel.log file in the source host, you see entries similar to:
2017-10-13T17:21:21.708Z cpu13:3810676)Hbr: 2208: Wire compression supported by server 10.5.8.47: FastLZ
2017-10-13T17:21:27.774Z cpu14:3810676)Hbr: 3002: Command: INIT_SESSION: error result=Failed gen=-1: Error for (datastoreUUID: "59c57xxx-3c3e0f44-e0be-101f742f57xx"), (diskId: "RDID-be30c75f-e2a0-4434-8769-d3110f5c7761"), (hostId: "host-9"), (pathname: $
2017-10-13T17:21:27.774Z cpu14:3810676)WARNING: Hbr: 3011: Command INIT_SESSION failed (result=Failed) (isFatal=FALSE) (Id=0) (GroupID=GID-c07217a9-dc9a-4086-b986-f1a9ba146dc0)
2017-10-13T17:21:27.774Z cpu14:3810676)WARNING: Hbr: 4573: Failed to establish connection to [10.5.8.47]:31031(groupID=GID-c07217a9-dc9a-4086-b986-f1a9ba146dc0): Failure
2017-10-13T17:21:43.510Z cpu1:216258)WARNING: elxnet: elxnet_mgmtGetAdapter:92: 0000:09:00.0: Failed to find node in vmkDevice table status: 0xbad0003
  • In the hbrsrv.log file of VR appliance at DR site, you see entries similar to:
2017-10-13T17:30:50.271Z verbose hbrsrv[7FF282A80700] [Originator@6876 sub=HostPicker] AffinityHostPicker forgetting host affinity for context '[] /vmfs/volumes/59bacebe-a53ccdd2-df9c-101f742f57b4/mtk-vln-test'
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] HbrError for (datastoreUUID: "59baxxx-a53ccdd2-df9c-101f742f57xx"), (hostId: "host-9"), (pathname: "test/test.vmdk"), (flags: retriable, pick-new-host) stack:
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [0] Class: NFC Code: 8
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [1] NFC error: NFC_SESSION_ERROR
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [2] Code set to: Host unable to process request.
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [3] Set error flag: retriable
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [4] Set error flag: pick-new-host
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [5] Can't open remote disk /vmfs/volumes/59baxxx-a53ccdd2-df9c-101f742f57xx/test/test.vmdk
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [6] Probing disk capacity.
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [7] Attempt 2 of 4, will retry after 50 ms.
2017-10-13T17:30:50.271Z info hbrsrv[7FF282A80700] [Originator@6876 sub=Main] [8] Ignored error.
2017-10-13T17:30:50.322Z info hbrsrv[7FF282A80700] [Originator@6876 sub=StorageManager] Running destructor for NFC connection to host-9.
2017-10-13T17:30:50.322Z info hbrsrv[7FF282A80700] [Originator@6876 sub=StorageManager] Destroying NFC connection to host-9.
2017-10-13T17:30:50.322Z verbose hbrsrv[7FF282A80700] [Originator@6876 sub=HostPicker] AffinityHostPicker choosing host host-9 for context '[] /vmfs/volumes/59baxxx-a53ccdd2-df9c-101f742f57xx/test'
 
  • In the hostd.log file on target Esxi host, you see entries similar to:
2017-10-13T18:16:10.603Z info hostd[411A3B70] [Originator@6876 sub=Nfcsvc] Plugin started
2017-10-13T18:16:10.711Z error hostd[411A3B70] [Originator@6876 sub=Nfcsvc] Read error from the nfcLib: NFC_NO_MEMORY (done=yep)
2017-10-13T18:16:10.903Z info hostd[42182B70] [Originator@6876 sub=Nfcsvc] PROXY connection to NFC(useSSL=0): found session ticket:[N9VimShared15NfcSystemTicketE:0x1f481f24]
2017-10-13T18:16:10.903Z info hostd[42182B70] [Originator@6876 sub=Nfcsvc] Successfully initialized nfc callback for a write to the socket to be invoked on a separate thread
2017-10-13T18:16:10.903Z info hostd[42182B70] [Originator@6876 sub=Nfcsvc] Plugin started
2017-10-13T18:16:22.851Z error hostd[40E40B70] [Originator@6876 sub=Nfcsvc] Read error from the nfcLib: NFC_NO_MEMORY (done=yep)
2017-10-13T18:16:23.151Z error hostd[42182B70] [Originator@6876 sub=Nfcsvc] Read error from the nfcLib: NFC_NO_MEMORY (done=yep)
2017-10-13T18:16:23.453Z error hostd[410AEB70] [Originator@6876 sub=Nfcsvc] Read error from the nfcLib: NFC_NO_MEMORY (done=yep)

Note: This log excerpt is an example. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware vSphere Replication 6.5.x

Cause

This Issue occurs due to host running out of NFC memory due to too many retries or previous stale sessions which are not cleared.

Resolution

To resolve this issue, restart the management services on the target host and see if we hit these errors.

In case the above steps does not resolve the issue, increase the session memory for nfc by changing the parameters in etc/vmware/hostd/config.xml nfcsvc config section and restart the hostd services.

          <nfcsvc>
              <path>libnfcsvc.so</path>
              <enabled>true</enabled>
              <maxMemory>50331648</maxMemory> <================= increase this to a larger value say 60*1024*1024
              <maxStreamMemory>10485760</maxStreamMemory>
           </nfcsvc>


Note: In case of multiple host, decode the host name from the vCenter Sever mob page of DR site as per the host name in HBR logs.