vSphere Replication - No host can be used to access datastore path
search cancel

vSphere Replication - No host can be used to access datastore path

book

Article ID: 312676

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction

Symptom:

  • Error -  No host can be used to access datastore path '[Target Datastore] VM_Name/VM_Name.vmdk'.



    Note: This Error with "No host can be used to access datastore path" was discovered following an upgrade of vSphere replication from 8.7 to 8.8 and 8.8 to 9.0.2.

  • Replication error is seen on the SRM UI, post VR upgrade.



    Error - A replication error occurred at the vSphere Replication Server for replication <VM_Name>. Details: 'Error for (diskId: "RDID-05e97824-####-####-####-44fb4e734abe"), (hostIP: "10.#.#.134"), (flags: on-disk-open, retriable): Error connecting to host.; Set error flag: retriable; Failed to create NFC connection to 10.#.#.134, 902 via ip Any: Failed to connect to server 10.#.#.134:902; establish nfc connection on host-27; Tried operation 4 times, giving up.; Failed to open disk, couldn't create NFC session; Set error flag: on-disk-open; Tried operation 4 times, giving up.; Failed to open replica (/vmfs/volumes/5db8069b-########-####-20677ce134a4/VM_Name/hbrdisk.RDID-05e97824-####-####-####-44fb4e734abe.2717517.217894942880274.vmdk); Failed to open activeDisk (GroupID=GID-0e2b094e-####-####-####-c51d387ef645) (DiskID=RDID-05e97824-####-####-####-44fb4e734abe); Can't create replica state (GroupID=GID-0e2b094e-####-####-####-c51d387ef645) (DiskID=RDID-05e97824-####-####-####-44fb4e734abe); Cannot activate group. Loading disks from database (GroupID=GID-0e2b094e-####-####-####-c51d387ef645) ; Connecting to group GID-0e2b094e-####-####-####-c51d387ef645'.

Environment

VMware vSphere Replication

Cause

  • The vSphere Replication Appliance needs to make the initial connection to the ESXi host through port # 80 and 902 to transfer data to the datastore at the recovery site. 

  • Firewall & IDS/IPS policies should be checked to make sure ports are not blocked.

  • MTU size should be consistent.

  • This error is also caused due to VR, ESXi & VC compatibility issues.

  • vSphere Replication Appliance and the ESXi hosts are unable to communicate, causing replications to fail.

In the "/var/log/vmkernel.log" on source ESXi host -

2017-06-07T16:31:07.042Z cpu12:42983549)WARNING: Hbr: 2997: Command INIT_SESSION failed (result=Failed) (isFatal=FALSE) (Id=0) (GroupID=GID-0e2b094e-####-####-####-c51d387ef645)
2017-06-07T16:31:07.042Z cpu12:42983549)WARNING: Hbr: 4521: Failed to establish connection to [x.x.x.x]:31031(groupID=GID-0e2b094e-####-####-####-c51d387ef645): Failure
2017-06-07T16:32:37.061Z cpu12:42983549)Hbr: 2196: Wire compression supported by server x.x.x.x: FastLZ
2017-06-07T16:32:37.074Z cpu16:42983549)Hbr: 2988: Command: INIT_SESSION: error result=Failed gen=-1: Error for (datastoreUUID: "5db8069b-########-####-20677ce134a4"), (diskId: "RDID-05e97824-####-####-####-44fb4e734abe"), (flags: on-disk-open): No accessible host for da$


Destination VR logs, in "/var/log/vmware/hbrsrv.log" -

2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] HbrError for (datastoreUUID: "5db8069b-########-####-20677ce134a4"), (diskId: "RDID-05e97824-####-####-####-44fb4e734abe") stack:
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] [0] No accessible host for datastore 0fedfa9a-########
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] [1] Code set to: Storage was not accessible.
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] [2] Failed to find host to get disk type
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] [3] While getting host capabilities for disk.
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] [4] Refreshing disk usage.
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] [5] Ignored error.
2017-06-07T16:50:05.274Z verbose hbrsrv[7F96E710D700] [Originator@6876 sub=SessionManager] hbr.replica.Task.GetInfo: authorized
2017-06-07T16:50:05.981Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Delta] ClientConnection (client=[x.x.x.x]:1139) allowing client with different minor version: Client 3 vs Server 5
2017-06-07T16:50:05.991Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Delta] Configured disks for group GID-0e2b094e-####-####-####-c51d387ef645:
2017-06-07T16:50:05.991Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Delta] RDID-05e97824-####-####-####-44fb4e734abe
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Main] HbrError for (datastoreUUID: "5db8069b-########-####-20677ce134a4") stack:
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Main] [0] No accessible host for datastore 5db8069b-########-####-20677ce134a4
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Main] [1] Code set to: Storage was not accessible.
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Main] [2] Failed to find host to get disk type
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Main] [3] Updating disk type for diskID=RDID-05e97824-####-####-####-44fb4e734abe
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Main] [4] Ignored error.
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Setup] Created ActiveDisk for DiskID: RDID-05e97824-####-####-####-44fb4e734abe Base path: /vmfs/volumes/0fedfa9a-########/Main Content Svr/Main Content Svr.vmdk curPath: /vmfs/volumes/5db8069b-########-####-20677ce134a4/Main Content Svr/Main Content Svr.vmdk diskHostReq: (UNKNOWN)

 

Destination VR appliance is unable to connect with source host through port 902.

2017-06-07T17:34:27.912+08:00 info hbrsrv[15748] [Originator@6876 sub=Libs groupID=GID-0e2b094e-####-####-####-c51d387ef645 opID=hsl-60639d50] CnxOpenTCPSocket: Timed out connecting to server 10.#.#.134:902: Operation now in progress
2017-06-07T17:34:27.912+08:00 info hbrsrv[15748] [Originator@6876 sub=Libs groupID=GID-0e2b094e-####-####-####-c51d387ef645 opID=hsl-60639d50] CnxAuthdConnect: Returning false because CnxAuthdConnectTCP failed
2017-06-07T17:34:27.912+08:00 info hbrsrv[15748] [Originator@6876 sub=Libs groupID=GID-0e2b094e-####-####-####-c51d387ef645 opID=hsl-60639d50] CnxConnectAuthd: Returning false because CnxAuthdConnect failed
2017-06-07T17:34:27.912+08:00 info hbrsrv[15748] [Originator@6876 sub=Libs groupID=GID-0e2b094e-####-####-####-c51d387ef645 opID=hsl-60639d50] Cnx_Connect: Returning false because CnxConnectAuthd failed
2017-06-07T17:34:27.912+08:00 info hbrsrv[15748] [Originator@6876 sub=Libs groupID=GID-0e2b094e-####-####-####-c51d387ef645 opID=hsl-60639d50] Cnx_Connect: Error message: Failed to connect to server 10.#.#.134:902
2017-06-07T17:34:27.912+08:00 warning hbrsrv[15748] [Originator@6876 sub=Libs groupID=GID-0e2b094e-####-####-####-c51d387ef645 opID=hsl-60639d50] [NFC ERROR]NfcNewAuthdConnectionEx: Failed to connect: Failed to connect to server 10.#.#.134:902
2017-06-07T17:34:27.912+08:00 warning hbrsrv[15748] [Originator@6876 sub=Libs groupID=GID-0e2b094e-####-####-####-c51d387ef645 opID=hsl-60639d50] [NFC ERROR]NfcNewAuthdConnectionEx: Failed to connect to peer. Error: Failed to connect to server 10.#.#.134:902
2017-06-07T17:34:27.912+08:00 warning hbrsrv[15748] [Originator@6876 sub=Libs groupID=GID-0e2b094e-####-####-####-c51d387ef645 opID=hsl-60639d50] [NFC ERROR]NfcEstablishAuthCnxToServer: Failed to create new AuthD connection: Failed to connect to server 10.#.#.134:902
2017-06-07T17:34:27.912+08:00 warning hbrsrv[15748] [Originator@6876 sub=Libs groupID=GID-0e2b094e-####-####-####-c51d387ef645 opID=hsl-60639d50] [NFC ERROR]Nfc_BindAndEstablishAuthdCnx3: Failed to create a connection with server 10.#.#.134: Failed to connect to server 10.#.#.134:902

 

In the VR appliance, "/opt/vmware/hms/logs/hms.log", we see below entries -

2017-06-07T17:34:27.912 INFO  hms.i18n.class com.vmware.hms.response.filter.I18nActivationResponseFilter [tcweb-15] (..response.filter.I18nActivationResponseFilter) [operationID=87751547-db0a-4c3a-8a41-d0bf8ab9894f-HMS-88032,sessionID=600955AE] | The localized message is: A replication error occurred at the vSphere Replication Server for replication <VM_Name>. Details: 'Error for (diskId: "RDID-05e97824-####-####-####-44fb4e734abe"), (hostIP: "10.#.#.134"), (flags: on-disk-open, retriable): Error connecting to host.; Set error flag: retriable; Failed to create NFC connection to 10.#.#.134, 902 via ip Any: Failed to connect to server 10.#.#.134:902; establish nfc connection on host-27; Tried operation 4 times, giving up.; Failed to open disk, couldn't create NFC session; Set error flag: on-disk-open; Tried operation 4 times, giving up.; Failed to open replica (/vmfs/volumes/5db8069b-########-####-20677ce134a4/VM_Name/hbrdisk.RDID-05e97824-####-####-####-44fb4e734abe.2717517.217894942880274.vmdk); Failed to open activeDisk (GroupID=GID-0e2b094e-####-####-####-c51d387ef645) (DiskID=RDID-05e97824-####-####-####-44fb4e734abe); Can't create replica state (GroupID=GID-0e2b094e-####-####-####-c51d387ef645) (DiskID=RDID-05e97824-####-####-####-44fb4e734abe); Cannot activate group. Loading disks from database (GroupID=GID-0e2b094e-####-####-####-c51d387ef645) ; Connecting to group GID-0e2b094e-####-####-####-c51d387ef645'.

Resolution

NOTE: This error can be caused due to VR & VC compatibility issues.

Refer - https://interopmatrix.broadcom.com/Interoperability

Customers upgrading to VC 8.0 U1/U2, please make sure you check the interoperability matrix to ensure that your current vSphere replication appliances is compatible with vCenter and ESXi hosts. 

1. Verify port # 80,443 & 902 is opened using curl commands. Some organizations block these ports as part of their security hardening rules. Check the ESXi firewall of the source and target ESXi hosts and add exceptions in "Allowed IP addresses". Exceptions must be added to the firewall rules of vSphere Web Client & vSphere Web Access in the format of X.X.X.0/24. This IP range belongs to the vSphere Replication appliance IP Address for Incoming Storage Traffic at the local site. 

Example: curl -v telnet://ESXi-FQDN or IP:80; curl -v telnet://ESXi-FQDN or IP:902

NOTE: Port # 80 is not used by VR 8.8 for communication. Please refer to specific VR versions for troubleshooting.  VMware vSphere Replication Security Guide

         

Source Target Port Protocol Description
vSphere Replication appliance Local vCenter Server 80 TCP All management traffic to the local vCenter Server proxy system. vSphere Replication opens an SSL tunnel to connect to the vCenter Server services.
vSphere Replication server in the vSphere Replication appliance Local ESXi host (intra-site) 80 HTTP Traffic between the vSphere Replication server and the ESXi hosts on the same site. vSphere Replication opens an SSL tunnel to the ESXi services.
vSphere Replication server ESXi host (intra-site only) on target site 902 TCP and UDP Traffic between the vSphere Replication server and the ESXi hosts on the same site. Specifically, the traffic of the NFC service to the destination ESXi servers.

 

2. Ensure that NTP on all the ESXi hosts, VRs and vCenters are in sync.

3. Check and disable IDS/IPS or other firewall packet filtering rules. A packet filtering firewall is a network security technique that regulates data flow to and from a network. Packet filters examine each TCP/IP packet, looking at the source and destination IP and port addresses. You can create rules that allow only known and established IP addresses while blocking all unknown or unknown IP addresses.

4. ​Ensure MTU is configured uniformly across all networking devices that support it between the sites including vSphere switches, ESXi hosts & vSphere Replication Appliance. 

    Refer - Testing VMkernel network connectivity with the vmkping command (1003728)

  • vmkping -I vmk2 -d -s 8972 Target-VR_IP (Use this command to test with 9000 MTU (Jumbo frames))
  • vmkping -I vmk2 -d -s 1472 Target-VR_IP (Use this command to test with 1500 MTU)
5. Check the firewall logs. Also, check whether the firewall logs are current, i.e. they are showing the current date/time or are old. Raise a case with the firewall vendor to investigate logging issues if any and check for any DENY rules that are being logged about any filtering rules, etc.


NOTE:
vSphere Replication by default uses a MTU (maximum transmission unit) of 1500. Achieving a MTU size of 1500 would be impossible on a WAN that uses VPN tunnels, IPsec encryption, overlay protocols & other firewalls that may be set at a different MTU size that doesn't match with the MTU set within the datacenter. Henceforth, the result of this VMKPING test may pass or fail but it shouldn't be considered as a direct indicator of this problem until you have explored all other possibilities. Try changing the MTU to a random size between 1500-9000 and check if you can communicate with the target VR.

Jumbo frames are network-layer PDUs (Protocol Data Unit) that have a size much larger than the typical 1500 bytes Ethernet MTU. Anything above the 1500 MTU is called a jumbo frame. Jumbo frames need to be configured to work on the ingress and egress interface of each device along the end-to-end transmission path. Furthermore, all devices in the topology must also agree on the maximum jumbo frame size. If there are devices along the transmission path that have varying frame sizes, then you can end up with fragmentation problems. Also, if a device along the path does not support jumbo frames and it receives one, it will drop it.

The benefits of jumbo frames can improve your network's performance. However, it is important to explore if and how your network devices support jumbo frames before you turn this feature on. Some of the biggest gains of using jumbo frames can be realized within and between data centers. But you should be cognizant of the fragmentation that may occur if those large frames try to cross a link that has a smaller MTU size.