SDDC Manager known_hosts files contain multiple duplicate entries causing workflow failures
search cancel

SDDC Manager known_hosts files contain multiple duplicate entries causing workflow failures

book

Article ID: 314627

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

Symptoms:
Adding a VxRail cluster fails on stage "Import SSH thumbprints of ESXi hosts and VxRail Manager"
During VxRail cluster expansion the SDDC UI hangs on the add host validation screen.


You see a high number of "Adding host key" logs in the commonsvcs.log.

grep -i "Adding host key" vcf-commonsvcs.log | wc -l
15864

You see the following similar add host key operation looping in the vcf-commonsvcs.log.

2023-09-19T17:52:47.538+0000 INFO  [common,ebde4d0eebca929d,3f2e] [c.v.e.s.c.u.SshKeyManagementService,http-nio-127.0.0.1-7100-exec-720] 
Adding host key esxi-1.vrack.vsphere.local,10.55.16.85:AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBKUyzKuZSGNoBch1Hj+zlsVVPloVa5tQp3A7j+x38XdvD66RAlVN5rfmMVvzUK/5LG9ekgP/bEYWlNw1d5mVjwA= 
to /etc/vmware/vcf/commonsvcs/known_hosts
2023-09-19T17:52:47.559+0000 INFO  [common,ebde4d0eebca929d,3f2e] [c.v.e.s.c.u.SshKeyManagementService,http-nio-127.0.0.1-7100-exec-720] 
Adding host key esxi-2.vrack.vsphere.local,10.55.16.84:AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBKUyzKuZSGNoBch1Hj+zlsVVPloVa5tQp3A7j+x38XdvD66RAlVN5rfmMVvzUK/5LG9ekgP/bEYWlNw1d5mVjwA= 
to /etc/vmware/vcf/commonsvcs/known_hosts
 

You see a similar error in the domainmanager.log

/var/log/vmware/vcf/domainmanager/domainmanager.log
2023-09-19T18:07:49.650+0000 DEBUG [vcf_dm,dc2c9b83942611d5,3d91] [c.v.v.secure.http.HttpClientService,dm-exec-9]  Star
ting GET request from host: 127.0.0.1, port: 80, isSecure: false, path: /appliancemanager/ssh/knownHosts, queryParamMap
: null, headers: {Accept=application/json,text/plain, Content-Type=application/json}
2023-09-19T18:07:49.651+0000 DEBUG [vcf_dm,dc2c9b83942611d5,3d91] [c.v.v.secure.http.HttpClientService,dm-exec-9]  Maki
ng request: GET http://127.0.0.1:80/appliancemanager/ssh/knownHosts
2023-09-19T18:09:19.741+0000 ERROR [vcf_dm,dc2c9b83942611d5,3d91] [c.v.e.s.c.s.a.a.ApplianceAdapterImpl,dm-exec-9]  Fai
led to get known host config
2023-09-19T18:09:19.742+0000 ERROR [vcf_dm,dc2c9b83942611d5,3d91] [c.v.e.s.o.model.error.ErrorFactory,dm-exec-9]  [VVSU
IJ] FAILED_TO_UPDATE_SSH_HOST_CONFIG Failed to update ssh known host config
com.vmware.evo.sddc.common.services.error.SddcManagerServicesIsException: Failed to update ssh known host config


Environment

VMware Cloud Foundation 4.x

Cause

A known_host entry is added in a currently unsupported format:

Incorrect format includes FQDN and IP on the same line. 
esxi-2.vrack.vsphere.local,10.55.16.84:AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAI

Supported format:
esxi-2.vrack.vsphere.local:AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAI
10.55.16.84:AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAI

Resolution

Engineering is working on fixing this issue in a future release of VCF.

 


Workaround:

0. Take a backup of the affected known_hosts file.
In this case we need to backup /etc/vmware/vcf/commonsvcs/known_hosts.

cp -rf /etc/vmware/vcf/commonsvcs/known_hosts /etc/vmware/vcf/commonsvcs/known_hosts.BACKUP
1. Cycle the commonsvc service on the SDDC Manager.
systemctl restart commonsvcs
2. Clear out the duplicate entries from the known_hosts file.
sed -i "/esxi-1.vrack.vsphere.local/d" /etc/vmware/vcf/commonsvcs/known_hosts
sed -i "/esxi-2.vrack.vsphere.local/d" /etc/vmware/vcf/commonsvcs/known_hosts
3. Add the SSH keys back into the known_hosts file. ​​
  • Manually add the rsa and ecdsa-sha2-nistp256 key for the FQDN and IP.
ssh-keyscan -4 -t rsa COMPONENT_IP >> /etc/vmware/vcf/commonsvcs/known_hosts
ssh-keyscan -4 -t rsa COMPONENT_FQDN >> /etc/vmware/vcf/commonsvcs/known_hosts
ssh-keyscan -4 -t ecdsa-sha2-nistp256 COMPONENT_FQDN >> /etc/vmware/vcf/commonsvcs/known_hosts
ssh-keyscan -4 -t ecdsa-sha2-nistp256 COMPONENT_IP >> /etc/vmware/vcf/commonsvcs/known_hosts

4. Retry the operation from the SDDC Manager UI.