Hostinfo for few hosts not getting updated in target site hostinfo hbrsrv.xxx.db table

Hostinfo for few hosts not getting updated in target site hostinfo hbrsrv.xxx.db table

search cancel

Hostinfo for few hosts not getting updated in target site hostinfo hbrsrv.xxx.db table

book

Article ID: 440451

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

1. Replication status is Not Active
2. Replication starts working when moved to a different host.
3. VMs under the replication tab display the error -
A replication error occurred at the vSphere Replication Server for replication 'VMReplication'. Details: 'No connection to VR Server for virtual machine VMReplication on host Production.host.local in cluster Production in Datacenter: Unknown'.

4. Both existing and newly configured virtual machine replications were stuck in a ""Not Active (RPO Violated)"" status. The issue occurred specifically when using the enhanced replication mode; temporarily switching the replications to legacy mode allowed them to function. Additionally, newly added ESXi hosts were completely inaccessible during replication configuration.

5. Few host information in the Hostinfo table in hbrsrv.xx.db at the source site is not updated in hbrsrv.xxx.db table at the target site VR appliance.

Environment

vSphere Replication 9.0.2, 9.0.2.2, 9.0.2.3

Cause

These errors can arise when a server’s motherboard is replaced or ESXi is reinstalled or the host is tagged with DISALLOW tag. Also, A certificate synchronization failure prevented the vSphere Replication Management Server (VRMS) from properly registering and trusting peer site ESXi hosts for enhanced replication traffic i.e. desynchronized state between the peers.

VMkwarning.log :

2026-04-28T17:14:29.860Z Wa(180) vmkwarning: cpu0:8847777)WARNING: Hbr: 5362: Failed to establish connection to [127.0.0.1]:32032 (groupID=GID-e36dbf9d-####-####-####-############): Broken pipe
2026-04-28T17:15:59.859Z Wa(180) vmkwarning: cpu1:8847777)WARNING: Hbr: 788: Failed to receive from 127.0.0.1 (groupID=GGID-e36dbf9d-####-####-####-############): Broken pipe
2026-04-28T17:15:59.859Z Wa(180) vmkwarning: cpu1:8847777)WARNING: Hbr: 2542: Failed to receive extended handshake response
2026-04-28T17:15:59.859Z Wa(180) vmkwarning: cpu1:8847777)WARNING: Hbr: 5362: Failed to establish connection to [127.0.0.1]:32032 (groupID=GID-e36dbf9d-####-####-####-############): Broken pipe
2026-04-28T17:17:29.859Z Wa(180) vmkwarning: cpu54:8847777)WARNING: Hbr: 788: Failed to receive from 127.0.0.1 (groupID=GID-e36dbf9d-####-####-####-############): Broken pipe
2026-04-28T17:17:29.859Z Wa(180) vmkwarning: cpu54:8847777)WARNING: Hbr: 2542: Failed to receive extended handshake response

hbrsrv.log:

2026-04-28T17:21:02.224Z In(166) hbrsrv[8848852]: [Originator@6876 sub=Main opID=hs-init-2f01900d] HbrError stack:
2026-04-28T17:21:02.225Z In(166) hbrsrv[8848852]: [Originator@6876 sub=Main opID=hs-init-2f01900d] [0] Resource temporarily unavailable
2026-04-28T17:21:02.225Z In(166) hbrsrv[8848852]: [Originator@6876 sub=Main opID=hs-init-2f01900d] [1] NfsVfs: While trying to lock '/vmfs/volumes/9cf961b6-########/.hbrroot_506cf186f462794634aa1ae469eb965129f11e1720c2bae565c00819fc316b88/persistent-cleanup-index-4c4c4544-005a-5610-8044-############.db'
2026-04-28T17:21:02.225Z In(166) hbrsrv[8848852]: [Originator@6876 sub=Main opID=hs-init-2f01900d] [2] Ignored error.
2026-04-28T17:21:02.225Z In(166) hbrsrv[8848852]: [Originator@6876 sub=Main opID=hs-init-2f01900d] HbrError stack:
2026-04-28T17:21:02.225Z In(166) hbrsrv[8848852]: [Originator@6876 sub=Main opID=hs-init-2f01900d] [0] SQLite error 14: unable to open database file
2026-04-28T17:21:02.225Z In(166) hbrsrv[8848852]: [Originator@6876 sub=Main opID=hs-init-2f01900d] [1] Opening database /vmfs/volumes/9cf961b6-138ddad4/.hbrroot_506cf186f462794634aa1ae469eb965129f11e1720c2bae565c00819fc316b88/persistent-cleanup-index-4c4c4544-005a-5610-8044-############.db 'Retry #2 for operation 'Open database '/vmfs/volumes/9cf961b6-########/.hbrroot_506cf186f462794634aa1ae469eb965129f11e1720c2bae565c00819fc316b88/persistent-cleanup-index-4c4c4544-005a-5610-8044-############.db'', Total time retrying: 20.6158 secs.'

hbr-agent.log :

2026-04-28T17:17:29.859Z In(166)[+] hbr-agent-bin[8848104]: thumbprint: 30:##:85:##:1D:##:8B:##:0C:##:B2:##:20:##:91:##:68:##:C7:##:4C:##:EA:##:89:##:02:##:CE:##:C2:##
2026-04-28T17:17:29.859Z In(166)[+] hbr-agent-bin[8848104]: certificate:-----BEGIN CERTIFICATE-----
2026-04-28T17:17:29.859Z In(166)[+] hbr-agent-bin[8848104]: ####/zCCA2egAwIBAgIJAO6pBrvU/########
2026-04-28T17:18:59.840Z In(166) hbr-agent-bin[8848104]: [0x000000fd97e3a700] info: [ConfigManager] No user configuration for key=hbrsvc_target_info in ConfigStore.
2026-04-28T17:18:59.840Z In(166) hbr-agent-bin[8848104]: [0x000000fd97e3a700] error: [ConfigManager] Failed to get config store object. Comp: esx, Grp: services, Key: hbrsvc_target_info, Id: 10.#.#.#, Prop: certificate
2026-04-28T17:18:59.840Z In(166) hbr-agent-bin[8848104]: [0x000000fd97e3a700] info: [ProxyConnection] Setting up secure tunnel to broker on 10.#.#.#:32032
2026-04-28T17:18:59.840Z In(166) hbr-agent-bin[8848104]: [0x000000fd97e3a700] info: [Proxy [Group: ] -> [10.#.#.#:32032]] Bound to vmk: vmk11 for connection to 10.#.#.#:32032
2026-04-28T17:18:59.843Z In(166) hbr-agent-bin[8848104]: [0x000000fd97cb7700] info: [Proxy [Group: ] -> [10.#.#.#::32032]] TCP Connect latency was 2634µs
2026-04-28T17:18:59.859Z In(166) hbr-agent-bin[8848104]: [0x000000fd97db9700] error: [Proxy [Group: GID-e36dbf9d-####-####-####-############] -> [10.#.#.#:32032]] The find server request failed: (1) Failed
2026-04-28T17:18:59.859Z In(166) hbr-agent-bin[8848104]: [0x000000fd97db9700] error: [Proxy [Group: GID-e36dbf9d-####-####-####-############] -> [10.#.#.#:32032]] Failed find server request additional error info: Thumbprint and certificate is not allowed to send replication data

hms.log

2026-04-28 16:37:43.757 INFO com.vmware.hms.hbrsrvuw.HbrsrvuwRegistrarService [hms-main-thread-5133] (..hms.hbrsrvuw.HbrsrvuwRegistrarService) [operationID=db86ccf6-####-####-####-2d557468b521-HMSINT-12418064] | handleHmsTaskFinishedEvent: [host-######][7da6ebf7-7c53-4e09-a6e9-############]: HmsTaskFinishedEvent[id: HTID-c655a092-ce43-4c56-928a-############; taskName: Register hbrsrvuw from host-######; taskManagedEntityId: null; taskTypeId: RegisterHbrTask; requestedExecutorId: hbr-management; taskTag: hbrsrvuw-reg-host-######--id--7da6ebf7-7c53-4e09-a6e9-############; autoAbort: true; queueTime: 1765395463672; startTime: 1765395463723; completeTime: null; error
: Cannot start connection to VR Server https://10.#.#.#:443/hbr: null; success: false; result: null]
2026-04-28 16:37:43.758 INFO com.vmware.jvsl.util.SingleThumbprintVerifier [hms-ping-scheduled-thread-8] (..jvsl.util.SingleThumbprintVerifier) [operationID=db86ccf6-####-####-####-2d557468b521-HMSINT-12418064, operationID=a05f7fa6-####-####-####-############-HMS-PING] | Failed to validate certificate chain for 10.#.#.# against HMS truststore. Error message: The certificate was not issued for use with the given hostname: 10.#.#.#
2026-04-28 16:37:43.758 ERROR com.vmware.hms.hbrsrvuw.HbrsrvuwRegistrarService [hms-main-thread-5133] (..hms.hbrsrvuw.HbrsrvuwRegistrarService$HostTask) [operationID=db86ccf6-####-####-####-############-HMSINT-12418064] | Register hbrsrvuw from host-######...FAILED (HTID-c655a092-ce43-4c56-928a-############); will retry
com.vmware.vim.binding.hms.remote.fault.ConnectionFault: Cannot start connection to VR Server https://10.#.#.#:443/hbr: null

2026-04-28 16:22:35.881 DEBUG com.vmware.hms.monitor.host [hms-main-thread-31] (..monitor.host.HostInventoryMonitor) [operationID=1d0b4d15-####-######-HMSINT-###, operationID=1d0b4d15-###-###########-HMSINT-###] | host host-xxx is disallowed: true
2026-04-28 16:22:35.884 DEBUG com.vmware.hms.monitor.host [hms-main-thread-86] (..monitor.host.HostInventoryMonitor) [operationID=8b188d61-####-#######-HMSINT-### | host host-xxx is disallowed: true
2026-04-28 16:22:35.886 DEBUG com.vmware.hms.monitor.host [hms-main-thread-31] (..monitor.host.HostInventoryMonitor) [operationID=5156bf02-ec43-44aa-803d-b74a1fc52113-HMSINT-106441] | host host-xxx is disallowed: true

Resolution

Validate the current certificate used by ESXi -
COMMAND: openssl s_client -connect <Host FQDN/IP Address>:443 | openssl x509 -noout -fingerprint -sha256

Step 1:

To Ensure there is no DISALLOW TAG assigned to host from vsphere UI or VRMS DB.

a. Navigate to /opt/vmware/hms/conf → cat embedded_db.cfg → Copy the VRMSDB Password from here and paste it in step b.
b.Connect vrmsdb “ /opt/vmware/vpostgres/current/bin/psql -U vrmsdb “
c. run query : select vcmoid,aggregateversion,clustermoid,lastknownthubmprint,state from hostentity where state='5';
State 5 means DISALLOW host.
d. if any DISALLOW host are there, remove the tag from vCenter and restart HMS and HBRSRV service in VR appliance.

Step 2:

To be followed, if STEP 1 does not work or in case of not valid case.

To ensure the same certificate is used by hbr-agent, restart the hbrsrv and hbr-agent services by going to Configuration > Services on the ESXi host.
Using commands to restart these services from ESXi SSH -
/etc/init.d/ restart hbrsrv
/etc/init.d/ restart hbr-agent

Step 3:

To be followed, if STEP 2 does not work.

Try with RECONNECT the site pair.

Step 4:

To be followed, if STEP 3 does not work.

Enhanced replication mode requires securely authenticated connections between the replication appliances and the ESXi hosts across both sites. When the VRMS appliance boots or when new hosts are introduced, it must synchronize and trust the peer site host certificates. Because this synchronization was failing silently in the background, the VRMS could not build secure connections to the hosts. Consequently, the hosts were omitted from the replication database (`HostInfo` table), blocking the enhanced replication data path and causing the replications to stall in an RPO violation state. Legacy mode bypassed this specific certificate trust enforcement, allowing traffic to flow temporarily.

Please check with customer for a maintenance window and follow the below steps:

1. Take the snapshot of VR appliances for both the sites.
2. Change the value <sync-peer-site-host-certificates-on-boot> to true in hms-configuration.xml file in both sites of VR appliances. It will force host sync between the sites.
3. Restart HMS on both sites and wait for a few minutes.
4. RECONNECT the site pair.
5. Also,( optional and in case of Enhanced replication) If any VR ADD-ON appliance is running, then shut down the VR ADD-ON appliance as it is not required in Enhanced replication.

The default value of ""sync-peer-site-host-certificates-on-boot"" is always set as FALSE.

Just to explain what would be the difference when the setting value is changed.
When it is off (false - default):

After hms starts, it only does a broad “catch-up” with the other site(s) where it does not yet have any host mirror for that relationship. If it already has host information stored from before the restart, it does not force a full refresh of every site on every boot. Day-to-day changes are still handled through the normal, ongoing mechanisms while the system is running.

When it is on (true):

After hms starts, it always schedules a full refresh of host information from every paired site, even when it already had data. Think of it as: “on every boot, double-check and rebuild the full picture from each remote site,” not only where the picture was empty.
In one sentence

Off = lighter startup: only fill in missing relationships.

On = heavier startup: reconcile everyone after every boot.

So normal restarts stay faster and quieter, and the product does not depend on every other site being fully reachable at boot just to complete a full re-pull. On is for situations where you want that extra “re-read everything from all partners once after startup” behavior (often as a troubleshooting or recovery preference), accepting more work and dependency on connectivity right after boot.

Additional Information

Isolating the Network Traffic of vSphere Replication

https://techdocs.broadcom.com/us/en/vmware-cis/live-recovery/vsphere-replication/9-0/vr-help-plug-in-9-0/isolating-the-vr-traffic-from-the-data-center-network.html

https://techdocs.broadcom.com/us/en/vmware-cis/live-recovery/vsphere-replication/9-0/vr-help-plug-in-9-0/replicating-virtual-machines/enhanced-vsphere-replication.html

Before initiating a migration from Legacy to Enhanced replication, you must ensure that the vSphere Replication Management Server has direct network connectivity to the management networks of all ESXi hosts intended to participate in the vSphere Replication protection workflows. This requirement applies to both the source hosts running the protected production VMs and the target hosts running the receiving replication traffic service instances.

Feedback

thumb_up Yes

thumb_down No