'No connection to VR Server for virtual machine : Not responding' while configuring replication
search cancel

'No connection to VR Server for virtual machine : Not responding' while configuring replication

book

Article ID: 387131

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

  • While configuring replication on a virtual machine, the below error is encountered in the DR UI:

A replication error occurred at the vSphere Replication Server for replication 'SRM-Test02'. Details:'No connection to VR Server for virtual machine Test02 on host esx-7-xxx in cluster xxx in oci-xxx: Not responding'. 

  • Network connectivity between the source ESXi and the target VR is healthy
  • All the required ports are open
  • In /var/log/hbr-agent.log we see the hbr-agent on the host tries to establish the connection with the destination VR
  • Though the vSphere replication traffic is tagged with a specific VMkernel port, the hbr-agent traffic is being sent without any specific VMkernel port tagged

Steps to validate:

  • /var/log/hbr-agent.log:

2025-01-29T09:14:04.133Z In(166) hbr-agent-bin[163773699]: [0x000000c3e4c8c700] info: [ProxyConnection] Setting up secure tunnel to broker on 10.30.xx.xx:32032
2025-01-29T09:14:04.133Z In(166) hbr-agent-bin[163773699]: [0x000000c3e4c8c700] info: [Proxy [Group: ] -> [10.30.xx.xx:32032]] Connecting to 10.30.xx.xx:32032 without specific vmk
2025-01-29T09:14:40.092Z In(166) hbr-agent-bin[163773699]: [0x000000c3e4d0d700] error: [Http] Unexpected HTTP status code: 500
2025-01-29T09:15:02.340Z In(166) hbr-agent-bin[163773699]: [0x000000c3e4c8c700] error: [Proxy [Group: ] -> [10.30.xx.xx:32032]] Failed to connect to 10.30.xx.xx:32032. Error: Connection timed out
2025-01-29T09:15:02.340Z In(166) hbr-agent-bin[163773699]: [0x000000c3e4c8c700] error: [Proxy [Group: ] -> [10.30.xx.xx:32032]] Failed to connect to broker on 10.30.5.3:32032: Connection timed out
2025-01-29T09:15:02.340Z In(166) hbr-agent-bin[163773699]: [0x000000c3e4c8c700] error: [Proxy [Group: ] -> [10.30.xx.xx:32032]] Failed to connect to broker: Connection timed out

  • Restart hbr-agent
  • Post the restart, the below error is seen in /var/log/hbr-agent.log:

2025-01-29T09:46:16.546Z In(166) hbr-agent-bin[2117286]: [0x000000bc2132c3c0] info: [Server] TCP_CONGESTION set to: cubic
2025-01-29T09:46:16.546Z In(166) hbr-agent-bin[2117286]: [0x000000bc2132c3c0] info: [Server] Snd buf size set to: 16777216
2025-01-29T09:46:16.546Z In(166) hbr-agent-bin[2117286]: [0x000000bc2132c3c0] info: [Server] Rcv buf size set to: 16777216
2025-01-29T09:46:16.547Z In(166) hbr-agent-bin[2117286]: [0x000000bc2132c3c0] info: [main] REST Server enabled: true
2025-01-29T09:46:16.547Z In(166) hbr-agent-bin[2117286]: [0x000000bc2132c3c0] info: [RESTServer] REST service started.
2025-01-29T09:46:21.553Z In(166) hbr-agent-bin[2117286]: [0x000000bc23203700] error: [Http] Unexpected HTTP status code: 500
2025-01-29T09:47:20.554Z In(166) hbr-agent-bin[2117286]: [0x000000bc23284700] error: [Http] Unexpected HTTP status code: 500
2025-01-29T09:48:20.555Z In(166) hbr-agent-bin[2117286]: [0x000000bc23101700] error: [Http] Unexpected HTTP status code: 500

  • Also, post the restart, a "No Permission" error is encountered in /var/log/hostd.log when hbr-agent is tries to login as a root user:

2025-01-29T09:47:20.181Z In(166) Hostd[2099956]: [Originator@6876 sub=Vimsvc.HaSessionManager opID=vim-cmd-8a-e190 sid=526776db] Accepted password for user root from 127.0.0.1 - session=526776db-6743-adca-6de9-f1db7a8b01c3
2025-01-29T09:47:20.181Z In(166) Hostd[2099956]: [Originator@6876 sub=Vimsvc opID=vim-cmd-8a-e190 sid=526776db] [Auth]: User root
2025-01-29T09:47:20.181Z Wa(164) Hostd[2099956]: [Originator@6876 sub=Vimsvc opID=vim-cmd-8a-e190 sid=526776db] Refresh function is not configured.User data can't be added to scheduler.User name: root
2025-01-29T09:47:20.182Z In(166) Hostd[2099956]: [Originator@6876 sub=Vimsvc.ha-eventmgr opID=vim-cmd-8a-e190 sid=526776db] Event 350 : Cannot login user [email protected]: no permission
2025-01-29T09:47:20.553Z In(166) Hostd[2099958]: [Originator@6876 sub=Solo.Vmomi] Activation finished; <<52597be2-7bdb-93ca-5ec6-3eb3ee8af516, <TCP '127.0.0.1 : 8307'>, <TCP '127.0.0.1 : 42458'>>, ha-sessionmgr, vim.SessionManager.login, <vim.version.version9, official, 5.5>, [N11HostdCommon18VmomiAdapterServer19ActivationResponderE:0x000000f12b011758]>
2025-01-29T09:47:20.554Z Db(167) Hostd[2099958]: [Originator@6876 sub=Solo.Vmomi] Arg userName:
2025-01-29T09:47:20.554Z Db(167) Hostd[2099576]: --> "local-root"
2025-01-29T09:47:20.554Z Db(167) Hostd[2099958]: [Originator@6876 sub=Solo.Vmomi] Arg password:
2025-01-29T09:47:20.554Z Db(167) Hostd[2099576]: --> (not shown)
2025-01-29T09:47:20.554Z Db(167) Hostd[2099576]: -->
2025-01-29T09:47:20.554Z Db(167) Hostd[2099958]: [Originator@6876 sub=Solo.Vmomi] Arg locale:
2025-01-29T09:47:20.554Z Db(167) Hostd[2099576]: --> (null)
2025-01-29T09:47:20.554Z In(166) Hostd[2099958]: [Originator@6876 sub=Solo.Vmomi] Throw vim.fault.NoPermission
2025-01-29T09:47:20.554Z In(166) Hostd[2099958]: [Originator@6876 sub=Solo.Vmomi] Result:
2025-01-29T09:47:20.554Z In(166) Hostd[2099576]: --> (vim.fault.NoPermission) {
2025-01-29T09:47:20.554Z In(166) Hostd[2099576]: -->    object = 'vim.Folder:ha-folder-root',
2025-01-29T09:47:20.554Z In(166) Hostd[2099576]: -->    privilegeId = "System.View",
2025-01-29T09:47:20.554Z In(166) Hostd[2099576]: -->    msg = "",
2025-01-29T09:47:20.554Z In(166) Hostd[2099576]: --> }

Environment

vSphere Replication 8.x
vSphere Replication 9.x
VMware Live Site Recovery 9.x

Network Isolation for vSphere Replication is configured

Cause

  • The issue occurs when Lockdown mode is enabled on the ESXI host
  • vSphere Replication software requires the hbr-agent to query information from ESXi, such as VM or network configuration every 60 seconds.
  • So for each host, there will be 1440 login events and 1440 logout events every day.
  • This process uses the 'root' user to perform this activity.
  • Due to the nature of ESXi lockdown mode, the hbr-agent is not able to get the network information and will try to send the traffic without any specific VMkernel tagged, causing the failure

Resolution

  • In the Lockdown Mode settings, add root in Exception Users:

 

  • Restart hbr-agent