HA configuration is stuck at the "Election" state and doesn't proceed further for a set of hosts.
search cancel

HA configuration is stuck at the "Election" state and doesn't proceed further for a set of hosts.

book

Article ID: 345414

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Symptoms:

HA configuration is stuck at the "Election" state and doesn't proceed further for a set of hosts.

The master host's FDM logs will report untrusted thumbprint errors. (/var/log/fdm.log)

YYYY-MM-DDTHH:MM info fdm[180268] [Originator@6876 sub=Cluster opID=SWI-71792bc4] Untrusted thumbprint (11:22:33) for host  (xx.xx.xx.xx)- failing verify 
YYYY-MM-DDTHH:MM verbose fdm[180268] [Originator@6876 sub=Cluster opID=SWI-71792bc4] Blacklisting ip address xx.xx.xx.xx for 60 seconds
YYYY-MM-DDTHH:MM verbose fdm[180268] [Originator@6876 sub=Cluster opID=SWI-71792bc4] IP xx.xx.xx.xx marked bad for reason Invalid Credentials
YYYY-MM-DDTHH:MM warning fdm[180268] [Originator@6876 sub=Cluster opID=SWI-71792bc4] Failed to verify host  (xx.xx.xx.xx) - closing connection
YYYY-MM-DDTHH:MM verbose fdm[180268] [Originator@6876 sub=Message opID=SWI-71792bc4] Accept completion callback error N5Vmomi5Fault13SecurityError9ExceptionE(Fault cause: vmodl.fault.SecurityError --> ) --> [context]zKq7AVECAQAAAP/bJAESZmRtAACoS+ZmZG0AAKMj3gDlB9IA5pTUABmSawCaEW4AybpvAGLtbwB47m8Azn5+AGOEfgBb/HsA9fx7AA0Q2wBdYdsAvPrYATt9AGxpYnB0aHJlYWQuc28uMAACvacObGliYy5zby42AA==[/context]
YYYY-MM-DDTHH:MM info fdm[180268] [Originator@6876 sub=Message opID=SWI-71792bc4] Destroying connection    

YYYY-MM-DDTHH:MM info fdm[180371] [Originator@6876 sub=Cluster opID=SWI-598dc61d] Untrusted thumbprint (77:88:99) for host  (xx.xx.xx.xx)- failing verify
YYYY-MM-DDTHH:MM verbose fdm[180371] [Originator@6876 sub=Cluster opID=SWI-598dc61d] Blacklisting ip address xx.xx.xx.xx for 60 seconds
YYYY-MM-DDTHH:MM verbose fdm[180371] [Originator@6876 sub=Cluster opID=SWI-598dc61d] IP xx.xx.xx.xx marked bad for reason Invalid Credentials
YYYY-MM-DDTHH:MM warning fdm[180371] [Originator@6876 sub=Cluster opID=SWI-598dc61d] Failed to verify host  (xx.xx.xx.xx) - closing connection
YYYY-MM-DDTHH:MM verbose fdm[180371] [Originator@6876 sub=Message opID=SWI-598dc61d] Accept completion callback error N5Vmomi5Fault13SecurityError9ExceptionE(Fault cause: vmodl.fault.SecurityError --> ) --> [context]zKq7AVECAQAAAP/bJAESZmRtAACoS+ZmZG0AAKMj3gDlB9IA5pTUABmSawCaEW4AybpvAGLtbwB47m8Azn5+AGOEfgBb/HsA9fx7AA0Q2wBdYdsAvPrYATt9AGxpYnB0aHJlYWQuc28uMAACvacObGliYy5zby42AA==[/context]
YYYY-MM-DDTHH:MM info fdm[180371] [Originator@6876 sub=Message opID=SWI-598dc61d] Destroying connection


SSL Thumbprint in VCDB for the impacted hosts: 
 

root@vc1 [ ~ ]# psql -U postgres -d VCDB -c "select id,dns_name,ip_address,host_ssl_thumbprint,expected_ssl_thumbprint from vpx_host;"
 id |     dns_name      |  ip_address  |                     host_ssl_thumbprint                     |                   expected_ssl_thumbprint
----+-------------------+--------------+-------------------------------------------------------------+-------------------------------------------------------------
 30 | example1.com  | xx.xx.xx.xx  | 77:88:99:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 | 77:88:99:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
 24 | example2.com  | xx.xx.xx.xx  | 77:88:99:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 | 77:88:99:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
27 | example3.com | xx.xx.xx.xx | 11:11:11:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 | 11:11:11:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
(3 rows)



SSL Thumbprint of the current certificate installed in the hosts: 

Impacted hosts:
[root@example1:~]  openssl x509 -in /etc/vmware/ssl/rui.crt -text -fingerprint |grep -i fingerprint
SHA1 Fingerprint=55:66:77:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
[root@example2:~] openssl x509 -in /etc/vmware/ssl/rui.crt -text -fingerprint |grep -i fingerprint SHA1 Fingerprint=33:44:55:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
Working host: [root@example3:~] openssl x509 -in /etc/vmware/ssl/rui.crt -text -fingerprint |grep -i fingerprint SHA1 Fingerprint=11:11:11:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
 



Environment

VMware vCenter Server 7.0.x
VMware vCenter Server 8.0.x
VMware vSphere ESXi 7.x

Cause

This can occur when there is a mismatch of thumbprints between VCDB and the actual host's SSL certificate. 
Such a mismatch could occur after the SSL certificate of the host is updated with custom certificates but not got synced with VCDB.

Resolution

Disconnect and reconnect the impacted hosts to update the host's current SSL thumbprint in VCDB. 

(Rebooting the host or restart of the services wouldn't update the VCDB).


Additional Information

Impact/Risks: HA cluster will not be formed.

+++++++++++++++++++++++
 
/var/log/fdm.log
 
Error: Untrusted thumbprint.
Action: Blacklisting IP address xx.xx.xx.xx for 60 seconds.
Reason: IP  xx.xx.xx.xx marked as bad due to invalid credentials.
We observed a mismatch between the Host SHA1 Fingerprint and the vCenter DB host_ssl_thumbprint & expected_ssl_thumbprint.
 
Additional Observations:
Disconnecting and reconnecting the impacted hosts did not update the thumbprint on vCenter.
 
Workaround: Validating and Updating SSL Thumbprint
 
1. Validate the Thumbprint Mismatch:
Compare the SSL thumbprint stored in the database with the host's actual thumbprint.
 
2. Stop vpxd service
 
service-control --stop vmware-vpxd
 
3. Access the vCenter Database:
 
psql -U postgres -d VCDB
 
4. Query SSL Thumbprints in VCDB for the Impacted Hosts: 
 
select id,dns_name,ip_address,host_ssl_thumbprint,expected_ssl_thumbprint from vpx_host;
 
5. Retrieve the Host's Current SSL Thumbprint:
On the host, run:
 
openssl x509 -in /etc/vmware/ssl/rui.crt -text -fingerprint |grep -i fingerprint
 
6. Update the Host's SSL Thumbprint in VCDB:
Replace <replace with the host SHA1 Fingerprint> with the fingerprint retrieved in Step 5 and <dnsname> with the affected host's DNS name:
 
UPDATE VPX_HOST SET host_ssl_thumbprint = '<replace with the host SHA1 Fingerprint >' where DNS_NAME= 'dnsname';
 
7. Resolve Mismatches Between host_ssl_thumbprint and expected_ssl_thumbprint:
 
UPDATE VPX_HOST SET expected_ssl_thumbprint = host_ssl_thumbprint WHERE DNS_NAME = 'dnsname';
 
8. Exit the Database:
\q
 
9. start vpxd service
 
service-control --start vmware-vpxd
 
Validate High Availability (HA) Status:
 
Ensure HA is functioning as expected.
If needed, disable and re-enable HA on the impacted hosts.