No connection to VR Server for virtual machine: Not responding
search cancel

No connection to VR Server for virtual machine: Not responding

book

Article ID: 384776

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction





ERROR
Operation Failed
Synchronization monitoring has stopped. Please verify replication traffic connectivity between the source host and the target vSphere Replication Server. Synchronization monitoring will resume when connectivity issues are resolved.
Operation ID: d76ecf44-f558-4d59-831c-607adf9ce753
6/30/2020, 10:37:47 AM EDT


This is a common error indicating a network communication issue between the ESXi host and the vSphere Replication (VR) Server. This error could be a culmination of various factors that have been discussed in detail in this article.

Environment

VMware vSphere Replication 8.x
VMware vSphere Replication 9.x

VMware ESXi 6.x
VMware ESXi 7.x
VMware ESXi 8.x

Cause

This issue typically has widespread causes ranging from VR appliance and ESXi host configuration to administrative changes/mistakes to environmental problems including the network, firewall, etc. 

Resolution


Pick a component and follow it's troubleshooting steps depending on your viewpoint and analysis.  

1. HOST

2. VRMS 

3. NETWORK

4. OVERLAPPING SUBNET OR IP RANGE


HOST

Pick a host the VM displaying this error is residing on and check the following. When troubleshooting at the host level, the whole cluster must be taken into consideration and reviewed. 
 
1. Review the vSwitch/vDS/VMkernel Adapter configuration. 
 
vSwitch/vSphere Distributed Switch

 
A. Is the switch connected to a vmnic ? 
 
B. Are these vmnic's tagged to the correct VLANs for carrying replication traffic on the physical switch ?
 
C. Are these vmnic's tagged to the correct VLAN used for replication traffic ?
 
TIP: VMware recommends using a dedicated vmnic/s for transferring replication traffic OR using a higher capacity vmnic's (10/20/40Gbps) with vSphere Network I/O control for the best performance. 
 
2. Is the MTU size set consistently on all the vSwitches/vDS in the cluster ? 
 
A mismatched MTU (Maximum Transmission Unit) can cause lack of connectivity and performance problems. The MTU size must be consistent on all the switches in the cluster and match the physical switch MTU size or be lower than it. If the logical switches on the host is configured to a lower size (for example: 1500) and the physical switch is configured to 9000, traffic will pass through and will not create a problem but the converse will create packet drops, fragmentation and slow down replication. 
 
TIP: Setting the MTU to 9000 on the vSwitches/vDS & the physical switches will give the best replication performance. 


3. VMkernel adapter 

 

Check the VMK adapter configuration - 
 
A. Is the vSphere Replication traffic & vSphere Replication NFC traffic services enabled on the chosen VMK adapter for replication traffic ? 
 
These services must be enabled to facilitate replication traffic on the host. 
 
vSphere Replication traffic
Handles the outgoing replication data that the source ESXi host transfers to the vSphere Replication server. Dedicate a VMkernel adapter on the source site to isolate the outgoing replication traffic.

vSphere Replication NFC traffic
Handles the incoming replication data on the target replication site 

NOTE: If these services are not enabled on the relevant VMK adapter, replication traffic will default to VMK0 which is the default management logical interface of the host.  

B.
Is the MTU size set consistently on all the VMK adapters in the cluster ?
 
Refer to the aforementioned information about MTU size. 
 
TIP: Setting the MTU to 9000 on the VMK adapter will give the best replication performance. 
 
4. Are you isolating vSphere Replication traffic on your network ?
 
 
When you decide to isolate replication traffic in your network, you have to create a dedicated VMK adapter for replication traffic. 

A. Check the IP settings of all the VMK adapters in the cluster and ensure that the IP address, subnet mask and default gateway are correct. 
B. The IP addresses assigned to all these VMK adapters must belong to the same broadcast domain
C. If there are new hosts added to the cluster or if there are hosts missing this VMK adapter; create a new one and fill out the IP settings.

5.
Static routes 
 
What is a static route and when do I have to configure one on my ESXi host ? 
 
In the context of vSphere Replication traffic, a static route is a route that the ESXi host uses when it's unaware of the destination network it's required to send the traffic to. In other words, a static route is added to the ESXi host to route traffic to the target replication appliance residing on a different IP network that is not same as the source network. 
 
If you are replicating on a flat Layer 2 network or within the same vCenter or across 2 clusters within the same vCenter using the same network then this wouldn't be required. Static routes are specifically required when the host must transport traffic to a destination network outside of it's own datacenter that has a different IP range. 
 
Remember to configure static routes on the source and target ESXi hosts when using replication traffic isolation. 

Configuring static routes for vmkernel ports on an ESXi host


TROUBLESHOOTING TIPS: 
 
1. VMkernel logs should provide a good insight on where the connection is failing. 
2. Run a PING test from the source ESXi host to the target replication appliance or add-on server.
3. Run a PING test from the source ESXi host to the target ESXi host that is connected to the replication datastore that is storing the replicas. 

A. From the source ESXi host ping the target ESXi host's management interface (By default, its VMK0 but could be different in every vSphere environment)
B. From the source ESXi host ping the target ESXi host's replication interface (Will be the VMK adapter you have configured for replication traffic)
C. Perform test A & B from the target ESXi hosts to the source ESXi host

If the ping results fail, you'll have to check for static routes and add them. If the ping test is failing despite of adding static routes, you'll have to check with your internal networking team to find out whether the router interfaces have been correctly configured to receive traffic from the respective hosts/clusters/datacenters. 

NOTE: ICMP (ping) is disabled in some environments. It will have to be enabled temporarily to perform these tests. 

VRMS

1. Check if the appliance is powered ON and that it's not hung while booting into the OS or powered OFF. 

2. Check if an additional VM network adapter is created for receiving replication traffic, if not follow the steps in this article

3. Check if the VM network adapter is attached to the correct network and is connected. 

4. Verify the network configuration by logging into VRMS Appliance Management Interface

NOTE: The IP configuration via the VRMS Appliance Management Interface supports only one default gateway on the vSphere Replication appliance. 


5.
If you are unable to make changes to the IP configuration from VAMI, use the commands from the KBs below. 

vSphere Replication Appliance and Site Recovery Manager displays the message : No Networking Detected (312781) 
  
Photon Network Manager Commands to update Hostname/IP Address/DNS in SRM & vSphere replication (312686)

6. Adding static routes in the appliance.

Multiple static routes can be added in the 10-eth<NIC_Number>.network file belonging to multiple clusters in the target datacenter. Check if the routes are correct and that all routes belonging to all clusters are added. 

NOTE:
1. Source VRMS appliance must have routes for the target ESXi cluster
2. Target VRMS appliance must have routes for the source ESXi cluster

7. vSphere Replication uses SSL/TLS certificates for secure communication between the source and target sites. If the certificates used for replication have expired, are misconfigured, or do not match on both sides, the replication process can fail or not initiate properly.


TROUBLESHOOTING TIPS: 

1. From the source VRMS, ping the target host cluster on it's replication VMK adapter & vCenter 
2. From the target VRMS, ping the source host cluster on it's replication VMK adapter & vCenter

The replication VMK adapter is either VMK0 or a designated VMK adapter with vSphere Replication traffic & vSphere Replication NFC traffic services enabled on it.

NOTE: 
1. It's ideal to PING all hosts in the cluster to verify connectivity
2. Replication appliance upgrades can lead to the loss of static routes, please backup the routes before performing upgrades. You must re-add the routes to 10-eth<NIC_Number>.network file after upgrade completes.


NETWORK

vSphere Replication relies on networking to move traffic, so any ongoing maintenance activity on the network switches, disconnected or loosely connected cables, etc. can directly impact replications. Work on fixing the network issues in the environment before troubleshooting vSphere Replication 

1. Check if all the required ports for replication are open 

2. Check if there are any NSX firewall policies or normal firewalls blocking replication traffic at the source and target datacenter.  

3. Check if the Intrusion Detection System (IDS) or Intrusion Prevention Systems (IPS) settings are interfering with replication traffic. 

4. vSphere Replication & Site Recovery Manager DOES NOT support network address translation (NAT).

NOTE: Site Recovery Manager does not support network address translation (NAT). If the network that you use to connect the Site Recovery Manager sites uses NAT, attempting to connect the sites results in an error. Use credential-based authentication and network routing without NAT when connecting the sites. 


OVERLAPPING SUBNET OR IP RANGE

Overlapping subnet or IP ranges can lead to unpredictable traffic flow causing such errors. Please work with a SRM Engineer first or try to diagnose the problem yourself by following this KB article. We would recommend you to work with your internal networking team and vSphere Networking team to identify the problem.

Please try to diagnose this problem with the help of your internal networking team. If your team effort is not leading you to any clues, then consider logging a case with SRM support and leverage the expertise of our Engineers. SRM support will work with you to diagnose the problem first and then collaborate with vSphere networking team, if needed to find a resolution forward.