Symptoms:
HA configuration is stuck at the "Election" state and doesn't proceed further for a set of hosts.
The master host's FDM logs will report untrusted thumbprint errors. (/var/log/fdm.log)
YYYY-MM-DDTHH:MM info fdm[180268] [Originator@6876 sub=Cluster opID=SWI-71792bc4] Untrusted thumbprint (11:22:33) for host (xx.xx.xx.xx)- failing verify
YYYY-MM-DDTHH:MM verbose fdm[180268] [Originator@6876 sub=Cluster opID=SWI-71792bc4] Blacklisting ip address xx.xx.xx.xx for 60 seconds
YYYY-MM-DDTHH:MM verbose fdm[180268] [Originator@6876 sub=Cluster opID=SWI-71792bc4] IP xx.xx.xx.xx marked bad for reason Invalid Credentials
YYYY-MM-DDTHH:MM warning fdm[180268] [Originator@6876 sub=Cluster opID=SWI-71792bc4] Failed to verify host (xx.xx.xx.xx) - closing connection
YYYY-MM-DDTHH:MM verbose fdm[180268] [Originator@6876 sub=Message opID=SWI-71792bc4] Accept completion callback error N5Vmomi5Fault13SecurityError9ExceptionE(Fault cause: vmodl.fault.SecurityError --> ) --> [context]zKq7AVECAQAAAP/bJAESZmRtAACoS+ZmZG0AAKMj3gDlB9IA5pTUABmSawCaEW4AybpvAGLtbwB47m8Azn5+AGOEfgBb/HsA9fx7AA0Q2wBdYdsAvPrYATt9AGxpYnB0aHJlYWQuc28uMAACvacObGliYy5zby42AA==[/context]
YYYY-MM-DDTHH:MM info fdm[180268] [Originator@6876 sub=Message opID=SWI-71792bc4] Destroying connection
YYYY-MM-DDTHH:MM info fdm[180371] [Originator@6876 sub=Cluster opID=SWI-598dc61d] Untrusted thumbprint (77:88:99) for host (xx.xx.xx.xx)- failing verify
YYYY-MM-DDTHH:MM verbose fdm[180371] [Originator@6876 sub=Cluster opID=SWI-598dc61d] Blacklisting ip address xx.xx.xx.xx for 60 seconds
YYYY-MM-DDTHH:MM verbose fdm[180371] [Originator@6876 sub=Cluster opID=SWI-598dc61d] IP xx.xx.xx.xx marked bad for reason Invalid Credentials
YYYY-MM-DDTHH:MM warning fdm[180371] [Originator@6876 sub=Cluster opID=SWI-598dc61d] Failed to verify host (xx.xx.xx.xx) - closing connection
YYYY-MM-DDTHH:MM verbose fdm[180371] [Originator@6876 sub=Message opID=SWI-598dc61d] Accept completion callback error N5Vmomi5Fault13SecurityError9ExceptionE(Fault cause: vmodl.fault.SecurityError --> ) --> [context]zKq7AVECAQAAAP/bJAESZmRtAACoS+ZmZG0AAKMj3gDlB9IA5pTUABmSawCaEW4AybpvAGLtbwB47m8Azn5+AGOEfgBb/HsA9fx7AA0Q2wBdYdsAvPrYATt9AGxpYnB0aHJlYWQuc28uMAACvacObGliYy5zby42AA==[/context]
YYYY-MM-DDTHH:MM info fdm[180371] [Originator@6876 sub=Message opID=SWI-598dc61d] Destroying connection
SSL Thumbprint in VCDB for the impacted hosts:
root@vc1 [ ~ ]# psql -U postgres -d VCDB -c "select id,dns_name,ip_address,host_ssl_thumbprint,expected_ssl_thumbprint from vpx_host;" id | dns_name | ip_address | host_ssl_thumbprint | expected_ssl_thumbprint ----+-------------------+--------------+-------------------------------------------------------------+------------------------------------------------------------- 30 | example1.com | xx.xx.xx.xx | 77:88:99:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 | 77:88:99:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 24 | example2.com | xx.xx.xx.xx | 77:88:99:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 | 77:88:99:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
27 | example3.com | xx.xx.xx.xx | 11:11:11:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 | 11:11:11:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
(3 rows)
SSL Thumbprint of the current certificate installed in the hosts:
Impacted hosts: [root@example1:~] openssl x509 -in /etc/vmware/ssl/rui.crt -text -fingerprint |grep -i fingerprint SHA1 Fingerprint=55:66:77:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
[root@example2:~] openssl x509 -in /etc/vmware/ssl/rui.crt -text -fingerprint |grep -i fingerprint SHA1 Fingerprint=33:44:55:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
Working host: [root@example3:~] openssl x509 -in /etc/vmware/ssl/rui.crt -text -fingerprint |grep -i fingerprint SHA1 Fingerprint=11:11:11:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
Disconnect and reconnect the impacted hosts to update the host's current SSL thumbprint in VCDB.
(Rebooting the host or restart of the services wouldn't update the VCDB).
Impact/Risks: HA cluster will not be formed.
service-control --stop vmware-vpxd
psql -U postgres -d VCDB
select id,dns_name,ip_address,host_ssl_thumbprint,expected_ssl_thumbprint from vpx_host;
openssl x509 -in /etc/vmware/ssl/rui.crt -text -fingerprint |grep -i fingerprint
UPDATE VPX_HOST SET host_ssl_thumbprint = '<replace with the host SHA1 Fingerprint >' where DNS_NAME= 'dnsname';
UPDATE VPX_HOST SET expected_ssl_thumbprint = host_ssl_thumbprint WHERE DNS_NAME = 'dnsname';
\q
service-control --start vmware-vpxd