vSAN file service does not redeploy vSANFS nodes
search cancel

vSAN file service does not redeploy vSANFS nodes

book

Article ID: 387008

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

vSAN file service does not deploy nodes after being removed. 

  • This can occur during an upgrade of ESXi or upgrading vSAN file service, 
  • If vSAN was enabled or disabled, where redeploy of nodes may have been necessary reason.


You may see a health check in Skyline health indicating an issue with vSAN file services infrastructure.

  • File server not found
  • VDFS daemon and sockrelay are not running
  • Sockrelay is not running

 

When attempting to start socketrelay on the host through SSH using command: /etc/init.d/fsvmsockrelay start, you will get the below error 

  • vSAN File Service Node cannot be found on this host
    • sockrelay is not running

 

 

If vSAN file services was disabled during upgrade, this can cause vSphere ui errors when trying to edit vSAN file services. This will resolve it self when all hosts are upgraded and match same version. You will still observe the 'EAM certificate trust' as well in the logs. 

  • In vCenter under the vSAN cluster configuration settings, vSAN file services tab does not load. 
    • Unable to extract requested data. Check vSphere Client logs for details.

Environment

vSAN 7.x
vSAN 8.x 

Cause

This is related to an EAM API call failing with CertificateNotTrustedFault or EAM agent has CertificateNotTrusted issue


You can confirm this issue by reviewing the EAM logs under: /var/log/vmware/eam/eam.log where you will see an error similar to the following: 

  • URLConnectionSpecFactory.java | 88 | Created URLConnectionSpec(urlLocation:https://<Vcenter>:443/vsanHealth/fileService/ovf/x.x.x.xxxx-xxxxxxxx/VMware-vSAN-File-Services-Appliance-x.x.x.xxxx-xxxxxxxx_OVF10.ovf, certificateVerification:true, certificateConfigured:false, headers: {} using default system VECS/system CAs trust
  • Agent OVF URL is not trusted.
  • com.vmware.eam.security.trust.NotTrusted: Suitable trust, not found!

 

Resolution

Resolve the EAM trust issue either by creating a leaf trust, disabling trust, installing cert trust, or replacing the machine certificate. 

 

Option 1: Configure the trust via EAM API. 

Do one of the following options below to create leaf trust, or disable trust to OVF URL. 

 

To keep cert verification and have secure download to external URL.

  • Configure a leaf SSL certificate that is to be trusted for the OVF (either stored in vCenter or external URL).
    •  /usr/lib/vmware-eam/bin/eam-utility.py install-cert <OVF_URL>

To bypass cert verification with unsecure download possible, 

  • Disable the SSL verification for certificate to that OVF when trying to access a specific OVF URL.
    •  /usr/lib/vmware-eam/bin/eam-utility.py disable-trust <OVF_URL>

 

Reference KB: https://knowledge.broadcom.com/external/article/313402/

 

Option 2, refresh the Machine cert in VECS (VMware Endpoint Certificate Store) for the vCenter server. 

An outdated or mismatching certificate data between the cert and VECS can cause the trust mismatch. To resolve, you can refresh the certificate in VECS with the current machine certificate for vCenter. Which will update endpoints to the current certificate chain. 

 

To learn more about VECS and reviewing the certificates stored in VECS, please see this KB. 

 

 

Additional Information

EAM API call fails with CertificateNotTrustedFault:

https://knowledge.broadcom.com/external/article/313402/

 

Upgrade Pre-check states "Source ESX Agent Manager Configuration contains URLs that are not trusted by the System:

https://knowledge.broadcom.com/external/article/313026/

 

Manually reviewing VECS:

https://knowledge.broadcom.com/external/article/321380/