You may see one of the errors below or both :ERROR
Operation Failed
Cannot establish a TCP connection to server at '10.X.X.X:8123'.
ERROR
Operation Failed
Cannot establish a TCP connection to server at '194.X.X.X:8123'. Details: 'https://194.X.X.X:8123/ invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to 194.X.X.X:8123 [/194.X.X.X] failed: Connection refused (Connection refused)"'.
hbrsrv.log:
2021-12-18T20:38:02.415Z error hbrsrv[01092] [Originator@6876 sub=AgentConnection] Connection failed to agent host-###1/hostd (10.X.X.X): Fault cause: vim.fault.InvalidLogin
2021-12-18T20:38:02.995Z error hbrsrv[01125] [Originator@6876 sub=AgentConnection] Connection failed to agent host-###2/hostd (10.X.X.X): Fault cause: vim.fault.InvalidLogin
2021-12-18T20:38:03.001Z error hbrsrv[01373] [Originator@6876 sub=AgentConnection] Connection failed to agent host-###3/hostd (10.X.X.X): Fault cause: vim.fault.InvalidLogin
2021-12-18T20:38:03.307Z error hbrsrv[01360] [Originator@6876 sub=AgentConnection] Connection failed to agent host-###4/hostd (10.X.X.X): Fault cause: vim.fault.InvalidLogin
2021-12-18T20:38:03.468Z error hbrsrv[01094] [Originator@6876 sub=AgentConnection] Connection failed to agent host-###5/hostd (10.X.X.X): Fault cause: vim.fault.InvalidLogin
NOTE: Please consider taking snapshots wherever necessary.
1. Each vSphere Replication Server (VRS) should have the same guestinfo.hbr.hms-thumbprint to talk to the vSphere Replication Appliance.
2. Each vSphere Replication Server (VRS) and vSphere Replication Appliance will have a unique guestinfo.hbr.hbrsrv-thumbprint output.
3. Only the vSphere Replication Appliance will have matching hbrsrv-thumbprint and hms-thumbprint.
The following steps must be carried out on the vSphere replication appliance that is failing with the errors mentioned under the symptoms section or disconnected or intermittently connecting and disconnecting .
FIX # 1
Firstly, let's find out whether the hms and hbr thumbprint on VRMS is the same or different. If it's the same move on to FIX # 2, if its different continue with the steps below.
NOTE: Only the vSphere Replication Appliance will have matching hbrsrv-thumbprint and hms-thumbprint. If they match there is no need to follow the steps below and the issue is not related to the hms & hbrsrv thumbprint mismatch. A mismatched thumbprint means that the hbrsrv sqlite database is holding the incorrect thumbprint for the HMS management service. This means that when the HMS service tries to communicate with its embedded replication server, the trust is broken and Invalid login errors show in the hbrsrv.log .
In the example below, we are seeing that they are both different.
root@vsphereprkrep [ ~ ]# /usr/bin/hbrsrv-replicainfo.sh | grep -i thumb
guestinfo-cache/guestinfo.hbr.hms-thumbprint 81:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:3F
guestinfo-cache/guestinfo.hbr.hbrsrv-thumbprint E2:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:28
/usr/bin/hbrsrv-guestinfo.sh set guestinfo.hbr.hms-thumbprint "E2:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:28"
The correct thumbprint value from the command above is found using the following steps.
guestinfo.hbr.hbrsrv-thumbprint
which you obtained previously but its best to check)root@VRMS [ ~ ]# grep -i hms-keystore-password /opt/vmware/hms/conf/hms-configuration.xml
<hms-keystore-password>6##########U</hms-keystore-password>
root@HOST[ /usr/java/default/bin ]# ./keytool -list -v -keystore /opt/vmware/hms/security/hms-keystore.jks -storepass 6##########U
Command for VR 8.8 : keytool -list -v -keystore /opt/vmware/hms/security/hms-keystore.jks -storepass 6##########U
Keystore type: jks
Keystore provider: SUN
Your keystore contains 1 entry
Alias name: jetty
Creation date: Nov 12, 2019
Entry type: PrivateKeyEntry
Certificate chain length: 1
Certificate[1]:
Owner: CN=192.X.X.12, OU=Unknown, O=Unknown
Issuer: CN=192.X.X.12, OU=Unknown, O=Unknown
Serial number: 30dec3c
Valid from: Tue Nov 12 22:31:28 IST 2019 until: Sun Nov 10 22:31:28 IST 2024
Certificate fingerprints:
MD5: C7:
:##
:##
:##
:##
:##
:##
:##
:##
:##
:##
:##
:##
:##
:5E##
SHA1: 4F:
:##
:##
:##
:##
:##
:##
:##
:##
:##
:##
:##
:##
:##
:##
:##
:##
:##
:69##
SHA256:
Signature algorithm name: E2:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:##:28
SHA256withRSA
Subject Public Key Algorithm: 2048-bit RSA key
Version: 3
Extensions:#1: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [KeyIdentifier [0000: DB
##
..[...O..LMsP.v.0010: AF A0 B2 73...s]##
**************************************************************************************
Warning:
The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /opt/vmware/hms/security/hms-keystore.jks -destkeystore /opt/vmware/hms/security/hms-keystore.jks -deststoretype pkcs12"
.
NOTE: If the SHA256 thumbprint matches with hbrsrv-thumbprint, then update the hms-thumbprint. But, please bear in mind that this may not always be the case, especially on VR upgrade scenarios where you might spot different information under the current keystore certificate thumbprint than what you noticed from the command output obtained from hbrsrv-replicainfo.sh.
After updating the thumbprint, run the commands below -
systemctl restart hms
systemctl restart hbrsrv
Now, check if the thumbprints match. If they still don't, power OFF/ON the VM and check. According to me, it mostly gets updated when the services are refreshed but if it doesn't proceed to FIX # 2.
FIX # 2
vSphere replication server is still in a disconnected state and you are continuing to see the following errors :
vSphere Replication Management Server could not establish connection to vSphere Replication Server at '10.X.X.X:8123'.
When you try to RECONNECT the VRMS or VR add-on servers, it displays the error -
ERROR
Operation Failed
Cannot establish a TCP connection to server at '10.X.X.X:8123'. Details: 'https://10.X.X.X:8123/ invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to 10.X.X.X:8123 [/10.X.X.X] failed: Connection refused (Connection refused)"'.
hms.log:
ERROR hms.net.hbr.ping.svr.520f5e62-0f0a-9825-9c06-aaaba5524495 [Ping Thread for session key: N/A and vmomi session: 40073F295437D8075D4403FC2F8659E9DF2A717E and server: 10.X.X.X:8123] (..net.impl.VmomiPingConnectionHandler) [] | Ping session: N/A on server 10.X.X.X:8123 failed: com.vmware.vim.vmomi.client.exception.ConnectionException: https://10.X.X.X:8123/ invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to 10.X.X.238:8123 [/10.X.X.238] failed: Connection refused (Connection refused)"
com.vmware.vim.vmomi.client.exception.ConnectionException: https://10.X.X.X:8123/ invocation failed with "org.apache.http.conn.HttpHostConnectException: Connect to 10.X.X.238:8123 [/10.X.X.238] failed: Connection refused (Connection refused)"
HMS log from VRMS indicates that its unable to connect to its add-on servers or to itself
Run the following command to check which IP address VR appliance is listening on - root@vrms [ ~ ]# netstat -anupt | grep 8123
tcp 0 0 0.0.0.0:8123 0.0.0.0:* LISTEN 3381/hbrsrv-bin
tcp 0 0 127.0.0.1:8123 127.0.0.1:51900 ESTABLISHED 3381/hbrsrv-bin
tcp6 0 0 :::8123 :::* LISTEN 3381/hbrsrv-bin
tcp6 0 0 127.0.0.1:51900 127.0.0.1:8123 ESTABLISHED 30124/java
This output is from my internal lab.
Do a curl test by running the command below - curl -v telnet://127.0.0.1:8123 (Replace the IP with the information found in the output above)
If the output displays 'connection refused', follow the steps below:
1. Take a snapshot of the VR appliance
2. vi /etc/vmware/hbrsrv-nic.xml
3. Keep the field against tag <ipForHMS>
empty.
4. Save the settings using !wq.
5. Restart hbrsrv service using the command : 'systemctl restart hbrsrv'
Example:root@vrms [ / ]# cat /etc/vmware/hbrsrv-nic.xml
<config><ipForHMS /><ipForFilter>192.X.X.11</ipForFilter><ipForNFC /></config>r
If the issue still persists, apply the fix below -
1. Take a snapshot of the VM
2. SSH to the appliance
3. Edit /usr/lib/systemd/system/hbrsrv.service
and add "TasksMax=infinity"
line as shown below.[Service]
Type=forking
EnvironmentFile=/etc/hbrsrv-init.config
User=hbrsrv
Group=vmware
PermissionsStartOnly=true
TasksMax=infinity
4. Run the commands below :systemctl daemon-reload
systemctl restart hbrsrv
Final recourse
1. Check if you are hitting the problem mentioned in vSphere Replication server is disconnected - Cannot establish a TCP connection to server (312710)
2. Update the IP of eth0 & eth1 using vSphere Replication Appliance and Site Recovery Manager displays the message : No Networking Detected (312781).
This will be helpful especially if you don't know the vast history of troubleshooting done by the customer and previous TSEs (if they only made changes to these ethernet adapters in CLI or VAMI).
3. Check if there's an overlapping of IPs at the subnet level between the networks of eth0 & eth1. We have to ensure that the IP addresses in the ESXi hosts and the VR appliances are using the correct gateway and subnet mask. There is a very remote chance that such an issue would be occurring and if it does, it's best to educate the customer to talk to their network architects to verify and re-IP their setup. But, this will have a negative impact on VR traffic because it creates a faulty or unpredictable traffic flow in the networking environment.
If the above steps do not resolve the issue, please contact Broadcom Support for further investigation.