vSphere Replication - No host can be used to access datastore path
search cancel

vSphere Replication - No host can be used to access datastore path

book

Article ID: 312676

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction

Symptoms:


 
  • vSphere Replication Appliance and the ESXi hosts are unable to communicate causing replications to fail.
  • You see an error similar to: "No host can be used to access datastore path".
  • In the /var/log/vmkernel.log on source ESXi host.

2017-06-07T16:31:07.042Z cpu12:42983549)WARNING: Hbr: 2997: Command INIT_SESSION failed (result=Failed) (isFatal=FALSE) (Id=0) (GroupID=GID-cab02c86-####-####-####-##########d0)
2017-06-07T16:31:07.042Z cpu12:42983549)WARNING: Hbr: 4521: Failed to establish connection to [x.x.x.x]:31031(groupID=GID-cab02c86-####-####-####-##########d0): Failure
2017-06-07T16:32:37.061Z cpu12:42983549)Hbr: 2196: Wire compression supported by server x.x.x.x: FastLZ
2017-06-07T16:32:37.074Z cpu16:42983549)Hbr: 2988: Command: INIT_SESSION: error result=Failed gen=-1: Error for (datastoreUUID: "0fedfa9a-#########"), (diskId: "RDID-f806a741-####-####-####-#########be"), (flags: on-disk-open): No accessible host for da$
2017-06-07T16:32:37.074Z cpu16:42983549)WARNING: Hbr: 2997: Command INIT_SESSION failed (result=Failed) (isFatal=FALSE) (Id=0) (GroupID=GID-cab02c86-####-####-####-##########d0)
2017-06-07T16:32:37.074Z cpu16:42983549)WARNING: Hbr: 4521: Failed to establish connection to [x.x.x.x]:31031(groupID=GID-cab02c86-####-####-####-##########d0): Failure
2017-06-07T16:34:07.092Z cpu21:42983549)Hbr: 2196: Wire compression supported by server x.x.x.x: FastLZ
2017-06-07T16:34:07.110Z cpu21:42983549)Hbr: 2988: Command: INIT_SESSION: error result=Failed gen=-1: Error for (datastoreUUID: "0fedfa9a-#########"), (diskId: "RDID-f806a741-####-####-####-#########be"), (flags: on-disk-open): No accessible host for da$
2017-06-07T16:34:07.110Z cpu21:42983549)WARNING: Hbr: 2997: Command INIT_SESSION failed (result=Failed) (isFatal=FALSE) (Id=0) (GroupID=GID-cab02c86-####-####-####-##########d0)
2017-06-07T16:34:07.110Z cpu21:42983549)WARNING: Hbr: 4521: Failed to establish connection to [x.x.x.x]:31031(groupID=GID-cab02c86-####-####-####-##########d0): Failure
2017-06-07T16:35:37.130Z cpu21:42983549)Hbr: 2196: Wire compression supported by server x.x.x.x: FastLZ
2017-06-07T16:35:37.146Z cpu21:42983549)Hbr: 2988: Command: INIT_SESSION: error result=Failed gen=-1: Error for (datastoreUUID: "0fedfa9a-#########"), (diskId: "RDID-f806a741-####-####-####-#########be"), (flags: on-disk-open): No accessible host for da$

  • In the /var/log/vmware/hbrsrv.log on recovery Replication appliance, you see entries similar to:

2017-06-07T16:50:05.161Z verbose hbrsrv[7F96E7009700] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: lastError, Hbr.Replica.Host.host-372. Applied change to temp map.
2017-06-07T16:50:05.161Z error hbrsrv[7F96E444A700] [Originator@6876 sub=HttpConnectionPool-000000] [ConnectComplete] Connect failed to <cs p:00007f96d8002b50, TCP:x.x.x.x:80>; cnx: (null), error: N7Vmacore17CanceledExceptionE(Opera
tion was canceled)
2017-06-07T16:50:05.162Z verbose hbrsrv[7F96E710D700] [Originator@6876 sub=PropertyProvider] RecordOp ADD: expectedDatastoresList["Hbr.Replica.Datastore.53558063-########-####-441ea147e1b4"], HbrStorageManager. Applied change to temp map
.
2017-06-07T16:50:05.161Z warning hbrsrv[7F96E7150760] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x00007f96eb0937a0, h:-1, <TCP '0.0.0.0:0'>, <TCP 'x.x.x.x:80'>>, e: system:125(Operation canceled)
2017-06-07T16:50:05.162Z warning hbrsrv[7F96E7150760] [Originator@6876 sub=Default] Operation cancelled
2017-06-07T16:50:05.162Z error hbrsrv[7F96E7150760] [Originator@6876 sub=HttpConnectionPool-000045] [ConnectComplete] Connect failed to <cs p:00007f96d8002b50, TCP:x.x.x.x:80>; cnx: (null), error: N7Vmacore17CanceledExceptionE(Opera
tion was canceled)
2017-06-07T16:50:05.171Z verbose hbrsrv[7F96E4409700] [Originator@6876 sub=SessionManager] hbr.replica.Task.GetInfo: authorized
2017-06-07T16:50:05.179Z verbose hbrsrv[7F96E710D700] [Originator@6876 sub=DiskMove] Disk Move spec: (nfc.CopySpec) []
2017-06-07T16:50:05.195Z info hbrsrv[7F96E710D700] [Originator@6876 sub=Main] HbrError for (datastoreUUID: "4eaede45-########-####-#########b6") stack:
2017-06-07T16:50:05.195Z info hbrsrv[7F96E710D700] [Originator@6876 sub=Main] [0] No accessible host for datastore 4eaede45-#######-####-#########b6
2017-06-07T16:50:05.195Z info hbrsrv[7F96E710D700] [Originator@6876 sub=Main] [1] Code set to: Storage was not accessible.
2017-06-07T16:50:05.195Z info hbrsrv[7F96E710D700] [Originator@6876 sub=Main] [2] Failed to find host to remove file
2017-06-07T16:50:05.195Z info hbrsrv[7F96E710D700] [Originator@6876 sub=Main] [3] Couldn't do cleanup for file '/vmfs/volumes/4eaede45-########-####-#########b6/Main Content Svr/hbrgrp.GID-cab02c86-####-####-####-##########d0.txt' (k
ey=4).
2017-06-07T16:50:05.195Z info hbrsrv[7F96E710D700] [Originator@6876 sub=Main] [4] Will retry later.
2017-06-07T16:50:05.195Z info hbrsrv[7F96E710D700] [Originator@6876 sub=Main] [5] Ignored error.
2017-06-07T16:50:05.195Z verbose hbrsrv[7F96E710D700] [Originator@6876 sub=StorageMap] Datastore 4eaede45-########-####-#########b6 removed from storage map
2017-06-07T16:50:05.195Z verbose hbrsrv[7F96E710D700] [Originator@6876 sub=PropertyProvider] RecordOp REMOVE: expectedDatastoresList["Hbr.Replica.Datastore.4eaede45-########-####-#########b6"], HbrStorageManager. Applied change to temp
map.
2017-06-07T16:50:05.195Z info hbrsrv[7F96E710D700] [Originator@6876 sub=StorageManager] Datastore 4eaede45-########-####-#########b6 removed from the storage manager.
2017-06-07T16:50:05.222Z verbose hbrsrv[7F96E43C8700] [Originator@6876 sub=ReplicaTaskManager] Completed task 52ceaee9-3a48-4511-97b4-05a9314d8ac4. Cleanup after 2017-06-07 17:00:05 UTC.
2017-06-07T16:50:05.222Z verbose hbrsrv[7F96E4409700] [Originator@6876 sub=Delta] Prune check group GID-cab02c86-####-####-####-##########d0 (has 0 consistent instances)
2017-06-07T16:50:05.222Z verbose hbrsrv[7F96E4409700] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: state, Hbr.Replica.Group.GID-cab02c86-####-####-####-##########d0. Applied change to temp map.
2017-06-07T16:50:05.223Z verbose hbrsrv[7F96E4409700] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: state, Hbr.Replica.Group.GID-cab02c86-####-####-####-##########d0. Applied change to temp map.
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] HbrError for (datastoreUUID: "0fedfa9a-########"), (diskId: "RDID-f806a741-####-####-####-#########be") stack:
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] [0] No accessible host for datastore 0fedfa9a-########
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] [1] Code set to: Storage was not accessible.
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] [2] Failed to find host to get disk type
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] [3] While getting host capabilities for disk.
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] [4] Refreshing disk usage.
2017-06-07T16:50:05.224Z info hbrsrv[7F96E4409700] [Originator@6876 sub=Main opID=hs-24bdaa72] [5] Ignored error.
2017-06-07T16:50:05.274Z verbose hbrsrv[7F96E710D700] [Originator@6876 sub=SessionManager] hbr.replica.Task.GetInfo: authorized
2017-06-07T16:50:05.981Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Delta] ClientConnection (client=[x.x.x.x]:1139) allowing client with different minor version: Client 3 vs Server 5
2017-06-07T16:50:05.991Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Delta] Configured disks for group GID-cab02c86-####-####-####-##########d0:
2017-06-07T16:50:05.991Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Delta] RDID-f806a741-####-####-####-#########be
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Main] HbrError for (datastoreUUID: "0fedfa9a-########") stack:
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Main] [0] No accessible host for datastore 0fedfa9a-########
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Main] [1] Code set to: Storage was not accessible.
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Main] [2] Failed to find host to get disk type
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Main] [3] Updating disk type for diskID=RDID-f806a741-####-####-####-#########be
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Main] [4] Ignored error.
2017-06-07T16:50:05.992Z info hbrsrv[7F96E7150760] [Originator@6876 sub=Setup] Created ActiveDisk for DiskID: RDID-f806a741-####-####-####-#########be Base path: /vmfs/volumes/0fedfa9a-########/Main Content Svr/Main Content Svr.vmdk cur
Path: /vmfs/volumes/0fedfa9a-########/Main Content Svr/Main Content Svr.vmdk diskHostReq: (UNKNOWN)

Cause

  1. The vSphere Replication Appliance needs to make the initial connection to the ESXi host through port # 80 and 902 to transfer data to the datastore at the recovery site. 
  2. Firewall & IDS/IPS policies
  3. MTU size 

Resolution

NOTE: 

This error can also be caused due to VR & VC compatibility issues.

https://interopmatrix.vmware.com/Interoperability 

Customers upgrading to VC 8.0 U1/U2, please make sure you check the interoperability matrix to ensure that your current vSphere replication appliances is compatible with vCenter and ESXi hosts. 

1. Verify port # 80,443 & 902 is opened using curl commands. Some organizations block these ports as part of their security hardening rules. Check the ESXi firewall of the source and target ESXi hosts and add exceptions in "Allowed IP addresses". Exceptions must be added to the firewall rules of vSphere Web Client & vSphere Web Access in the format of X.X.X.0/24 (Example: 192.168.10.0/24). This IP range belongs to the vSphere Replication appliance IP Address for Incoming Storage Traffic at the local site. 

          Example: curl -v telnet://ESXi-FQDN or IP:80; curl -v telnet://ESXi-FQDN or IP:902

NOTE: Port # 80 is not used by VR 8.8 for communication. Please refer to specific VR versions for troubleshooting.  Services, Ports, and External Interfaces That the vSphere Replication Virtual Appliance Uses

         

Source Target Port Protocol Description
vSphere Replication appliance Local vCenter Server 80 TCP All management traffic to the local vCenter Server proxy system. vSphere Replication opens an SSL tunnel to connect to the vCenter Server services.
vSphere Replication server in the vSphere Replication appliance Local ESXi host (intra-site) 80 HTTP Traffic between the vSphere Replication server and the ESXi hosts on the same site. vSphere Replication opens an SSL tunnel to the ESXi services.
vSphere Replication server ESXi host (intra-site only) on target site 902 TCP and UDP Traffic between the vSphere Replication server and the ESXi hosts on the same site. Specifically, the traffic of the NFC service to the destination ESXi servers.

 

  1. Ensure that NTP on all the ESXi hosts, VRs and vCenters are in sync using the following kb Configuring Network Time Protocol (NTP) on ESX/ESXi hosts using the vSphere Client
     
  2. Check and disable IDS/IPS or other firewall packet filtering rules. A packet filtering firewall is a network security technique that regulates data flow to and from a network. Packet filters examine each TCP/IP packet, looking at the source and destination IP and port addresses. You can create rules that allow only known and established IP addresses while blocking all unknown or unknown IP addresses.
  3. ​Ensure MTU is configured uniformly across all networking devices that support it between the sites including vSphere switches, ESXi hosts & vSphere Replication Appliance. 

             Testing VMkernel network connectivity with the vmkping command (1003728)

            Example: Testing using vmkping commands.

  1. vmkping -I vmk2 -d -s 8972 Target-VR_IP (Use this command to test with 9000 MTU (Jumbo frames))
  2. vmkping -I vmk2 -d -s 1472 Target-VR_IP (Use this command to test with 1500 MTU)
     5. Check the firewall logs. Also, check whether the firewall logs are current, i.e. they are showing the current date/time or are old. Raise a case with the firewall vendor to investigate logging issues if any and check for any DENY rules that are being logged about any filtering rules, etc.


NOTE:
vSphere Replication by default uses a MTU (maximum transmission unit) of 1500. Achieving a MTU size of 1500 would be impossible on a WAN that uses VPN tunnels, IPsec encryption, overlay protocols & other firewalls that may be set at a different MTU size that doesn't match with the MTU set within the datacenter. Henceforth, the result of this VMKPING test may pass or fail but it shouldn't be considered as a direct indicator of this problem until you have explored all other possibilities. Try changing the MTU to a random size between 1500-9000 and check if you can communicate with the target VR.

Jumbo frames are network-layer PDUs (Protocol Data Unit) that have a size much larger than the typical 1500 bytes Ethernet MTU. Anything above the 1500 MTU is called a jumbo frame. Jumbo frames need to be configured to work on the ingress and egress interface of each device along the end-to-end transmission path. Furthermore, all devices in the topology must also agree on the maximum jumbo frame size. If there are devices along the transmission path that have varying frame sizes, then you can end up with fragmentation problems. Also, if a device along the path does not support jumbo frames and it receives one, it will drop it.

The benefits of jumbo frames can improve your network's performance. However, it is important to explore if and how your network devices support jumbo frames before you turn this feature on. Some of the biggest gains of using jumbo frames can be realized within and between data centers. But you should be cognizant of the fragmentation that may occur if those large frames try to cross a link that has a smaller MTU size.