Troubleshooting Network File Copy(NFC) issues during clone and xvMotion
search cancel

Troubleshooting Network File Copy(NFC) issues during clone and xvMotion

book

Article ID: 324581

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • Deploying a virtual machine from a template with customization may fail with error:

"Error during provisioning initial publish failed: Fault type is VC_FAULT_FATAL - Cannot install the vCenter server agent service. Cannot upload agent"

  • In the /var/log/vmware/vpxd/vpxd.log you may find entries similar to:

2022-11-30T17:45:29.130+11:00 warning vpxd[06464] [Originator@6876 sub=Default opID=lb1cxo8q-18311-auto-e4p-h5:70006611-4-01] SSL: connect failed (5)
2022-11-30T17:45:29.130+11:00 warning vpxd[06464] [Originator@6876 sub=Default opID=lb1cxo8q-18311-auto-e4p-h5:70006611-4-01] [NFC ERROR]NfcNewAuthdConnectionEx: Failed to connect: SSL failed to connect to peer
2022-11-30T17:45:29.130+11:00 warning vpxd[06464] [Originator@6876 sub=Default opID=lb1cxo8q-18311-auto-e4p-h5:70006611-4-01] [NFC ERROR]NfcNewAuthdConnectionEx: Failed to connect to peer. Error: SSL failed to connect to peer
2022-11-30T17:45:29.131+11:00 warning vpxd[06464] [Originator@6876 sub=Default opID=lb1cxo8q-18311-auto-e4p-h5:70006611-4-01] [NFC ERROR]NfcEstablishAuthCnxToServer: Failed to create new AuthD connection: SSL failed to connect to peer
2022-11-30T17:45:29.131+11:00 warning vpxd[06464] [Originator@6876 sub=Default opID=lb1cxo8q-18311-auto-e4p-h5:70006611-4-01] [NFC ERROR]Nfc_BindAndEstablishAuthdCnx3: Failed to create a connection with server esxihostname: SSL failed to connect to peer
2022-11-30T17:45:29.131+11:00 error vpxd[06464] [Originator@6876 sub=vpxNfcClient opID=lb1cxo8q-18311-auto-e4p-h5:70006611-4-01] Unable to connect to NFC server: SSL failed to connect to peer

2022-11-30T17:45:29.131+11:00 error vpxd[06464] [Originator@6876 sub=HostAccess opID=lb1cxo8q-18311-auto-e4p-h5:70006611-4-01] Failed to upload files: N3Vim5Fault16HostConnectFault9ExceptionE(Fault cause: vim.fault.HostConnectFault

2022-11-30T17:45:29.150+11:00 error vpxd[06464] [Originator@6876 sub=VmProv opID=lb1cxo8q-18311-auto-e4p-h5:70006611-4-01] Get exception while executing action vpx.vmprov.CustomizeVm: N3Vim5Fault18AgentInstallFailed9ExceptionE(Fault cause: vim.fault.AgentInstallFailed

2022-11-30T17:45:29.167+11:00 error vpxd[06464] [Originator@6876 sub=vpxLro opID=lb1cxo8q-18311-auto-e4p-h5:70006611-4-01] [VpxLRO] Unexpected Exception: N3Vim5Fault18AgentInstallFailed9ExceptionE(Fault cause: vim.fault.AgentInstallFailed

2022-11-30T17:45:29.173+11:00 info vpxd[06464] [Originator@6876 sub=Default opID=lb1cxo8q-18311-auto-e4p-h5:70006611-4-01] [VpxLRO] -- ERROR lro-889097 -- vm-10035 -- vim.VirtualMachine.clone: vim.fault.AgentInstallFailed:

--> (vim.fault.AgentInstallFailed)

  • Cloning a virtual machine across vCenter servers may fail with a Timeout error
  • In the /var/log/vmware/vpxd/vpxd.log you may find entries similar to:

2023-03-01T11:21:31.519+05:30 info vpxd[12112] [Originator@6876 sub=vpxTaskInfo opID=ld8o3z79-1402657-auto-u2aq-h5:70265880-e5-01] Timed out waiting for task vim.Task:haTask--nfc.NfcManager.copy-2467745464

2023-03-01T11:21:31.520+05:30 warning vpxd[12112] [Originator@6876 sub=vpxLro opID=ld8o3z79-1402657-auto-u2aq-h5:70265880-e5-01] [VpxLRO] Timeout waiting on updates for haTask--nfc.NfcManager.copy-2467745464

2023-03-01T11:21:31.520+05:30 error vpxd[12134] [Originator@6876 sub=VmProv opID=ld8o3z79-1402657-auto-u2aq-h5:70265880-e5-01] Get exception while executing action vpx.vmprov.CopyVmFiles: N3Vim5Fault8Timedout9ExceptionE(Fault cause: vim.fault.Timedout

  • Deploying  a virtual machine from a content library template may fail with error "Cannot connect to host"
  • In the /var/log/vmware/vpxd/vpxd.log you may find entries similar to:

2022-10-05T16:45:38.195+08:00 info vpxd[42107] [Originator@6876 sub=Default opID=6b376cbe-f506-44d5-bc8d-28f142d5a406-5d-fa] [VpxLRO] -- ERROR task-27880 -- nfcManager -- nfc.NfcManager.copy: vim.fault.HostConnectFault:
--> Result:
--> (vim.fault.HostConnectFault) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>
-->    msg = "Cannot connect to host."

  • Cross vCenter migration(xvMotion) may fail with a timeout error
  • In the /var/log/vmware/vpxd/vpxd.log you may find entries similar to:

2022-12-01T17:17:32.242+13:00 error vpxd[09974] [Originator@6876 sub=VmProv opID=lb370r31-55372-auto-16q7-h5:70006900-36-01] xVC Host Datastore Migrate failed at vpx.vmprov.CopyVmFiles for poweredOn VM 'TEST' (vm-3073, ds:///vmfs/volumes/60401039-1c77df90-89ca-0025b53a0055/TEST/TEST.vmx) on host-15 (10.253.4.25) in pool resgroup-9 with ds ds:///vmfs/volumes/60401039-1c77df90-89ca-0025b53a0055/ to host-119 (10.253.5.17) in pool resgroup-9 with ds ds:///vmfs/volumes/624cf325-66f7b23c-77bf-0025b53a0150/ with migId 2562194025937586070 with fault vim.fault.Timedout:

2016-02-05T20:14:34.878Z error vpxd[7F33BE5EE700] [Originator@6876 sub=VmProv opID=decb6787-3676-44e9-9a02-d38628d09200-103172-ngc-b8-9f] [WorkflowImpl] Get exception while executing action vpx.vmprov.CopyVmFiles: vim.fault.Timedout

  • In the /var/run/log/hostd.log you may find entries similar to:

2022-12-01T03:48:57.421Z warning hostd[2101931] [Originator@6876 sub=Libs opID=lb370r31-55372-auto-16q7-h5:70006900-36-01-9ee1 user=vpxuser:VSPHERE.LOCAL\Administrator] [NFC ERROR]NfcNewAuthdConnectionEx: Failed to connect: SSL failed to connect to peer

2022-12-01T03:48:57.421Z warning hostd[2101931] [Originator@6876 sub=Libs opID=lb370r31-55372-auto-16q7-h5:70006900-36-01-9ee1 user=vpxuser:VSPHERE.LOCAL\Administrator] [NFC ERROR]NfcNewAuthdConnectionEx: Failed to connect to peer. Error: SSL failed to connect to peer

2022-12-01T03:48:57.421Z warning hostd[2101931] [Originator@6876 sub=Libs opID=lb370r31-55372-auto-16q7-h5:70006900-36-01-9ee1 user=vpxuser:VSPHERE.LOCAL\Administrator] [NFC ERROR]NfcEstablishAuthCnxToServer: Failed to create new AuthD connection: SSL failed to connect to peer

2022-12-01T03:48:57.421Z warning hostd[2101931] [Originator@6876 sub=Libs opID=lb370r31-55372-auto-16q7-h5:70006900-36-01-9ee1 user=vpxuser:VSPHERE.LOCAL\Administrator] [NFC ERROR]Nfc_BindAndEstablishAuthdCnx3: Failed to create a connection with server 10.253.5.17: SSL failed to connect to peer

2022-12-01T03:48:57.421Z error hostd[2101931] [Originator@6876 sub=NfcManager opID=lb370r31-55372-auto-16q7-h5:70006900-36-01-9ee1 user=vpxuser:VSPHERE.LOCAL\Administrator] Unable to connect to NFC server: SSL failed to connect to peer

2022-12-01T03:48:57.422Z error hostd[2101931] [Originator@6876 sub=NfcManager opID=lb370r31-55372-auto-16q7-h5:70006900-36-01-9ee1 user=vpxuser:VSPHERE.LOCAL\Administrator] Error encountered while opening clients for copy spec:

--> N3Vim5Fault16HostConnectFault9ExceptionE(Fault cause: vim.fault.HostConnectFault

  • For all the above mentioned symptoms,  in the var/run/log/vmauthd.log of the destination ESXi host, you see entries similar to:
2022-12-07T07:01:01.493Z vmauthd[2117874]: Connect from remote socket (10.151.24.15:58560).
2022-12-07T07:01:01.493Z vmauthd[2117874]: Connect from 10.151.24.15
2022-12-07T07:02:20.790Z vmauthd[2117870]: SSL: syscall error 110: Connection timed out
2022-12-07T07:02:20.790Z vmauthd[2117870]: recv() FAIL: 110.

2022-12-07T07:02:20.790Z vmauthd[2117870]: VMAuthdSocketRead: read failed. Closing socket for reading.
2022-12-07T07:02:20.790Z vmauthd[2117870]: Read failed.
2022-12-07T07:02:20.790Z vmauthd[2117870]: VMAuthdSocketWrite: No socket.


2022-11-30T06:43:31.155Z vmauthd[2311827]: Connect from remote socket (10.151.24.15:53310).
2022-11-30T06:43:31.155Z vmauthd[2311827]: Connect from 10.151.24.15
2022-11-30T06:45:31.160Z vmauthd[2311827]: recv() FAIL: 11.
2022-11-30T06:45:31.160Z vmauthd[2311827]: VMAuthdSocketRead: read failed. Closing socket for reading.
2022-11-30T06:45:31.160Z vmauthd[2311827]: Read failed.
2022-11-30T06:45:31.160Z vmauthd[2311827]: VMAuthdSocketWrite: No socket.


Cause

These issues can occur due to the following reasons:
  • Port 902 is not open between the source and destination ESXi hosts(firewall blocking connectivity) participating in the NFC Connection
  • MTU mismatch in the environment affecting the connectivity between the source and destination ESXi hosts
Few important points related to NFC(Network file copy):
  • NFC is used by ESXi host when data needs to be copied over the network between datastores during clone or xvMotion.
  • NFC connection is established between two ESXi hosts when the destination ESXi host does not have access to the source datastore.
  • NFC requires bidirectional connectivity between the ESXi hosts over TCP port 902.
  • If jumbo frames are configured on the ESXi hosts for management or provisioning, NFC connection uses the packet size of 8960 bytes.
  • The physical network between the ESXi hosts should support jumbo frames. Otherwise large packets(packets>1500 bytes) may get dropped resulting in NFC connection failure

Resolution

  • Identify the source and destination ESXi hosts participating in the Clone or xvMotion operation.
Note: When VM's are deployed from content library templates, vCenter selects a random ESXi host as a source ESXi host which has access to the template datastore.
  • Test the bidirectional connectivity between the ESXi hosts over port 902 using below command from a ssh session
nc -z <ESXi-IP> 902

Output of a successful connection:
[root@esxi-1:~] nc -z 192.168.0.82 902
Connection to 192.168.0.82 902 port [tcp/authd] succeeded!


Note:  If this test fails, port 902 is not open between the ESXi hosts. Firewall could be blocking the connectivity.
 
  • If you have Jumbo Frames configured on the Management or Provisioning interfaces of the  ESXi hosts, test the bidirectional connectivity using the below command
vmkping -d -s 8972 <ESXi-IP>

In the command, the -d option sets DF (Don't Fragment) bit on the IPv4 packet. 8972 is the size needed for 9000 MTU in ESXi.

Note: If this test fails, large packets are dropped along the path between the ESXi hosts. MTU mismatch along the path could cause this issue.

Output of a successful connection:
PING server(10.0.0.1): 8972 data bytes
8980 bytes from 10.0.0.1: icmp_seq=0 ttl=64 time=10.245 ms
8980 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=0.935 ms
8980 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.926 ms
--- server ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.926/4.035/10.245 ms


Note: NFC connectivity will be via management vmkernel port unless there is a dedicated provisioning vmkernel interface configured on the ESXi hosts.

Additional Information

For additional information related to vmkping, refer Testing VMkernel network connectivity with the vmkping command