Symptoms:
'Cannot complete the operation. See the event log for details. File server creation failed due to unknown reason. Contact Broadcom Support for more information.'
VMware vSAN 7.x
VMware vSAN 8.x
/scratch/log/vdfs_support/containers/fsvm_logs/journal:Jun 12 17:16:08 localhost vsfs-xxxxxxxxxxxxxx[1401]: [MainThread] Changing container state: container_init_succeeded -> container_start_succeeded
/scratch/log/vdfs_support/containers/fsvm_logs/journal:Jun 12 17:16:08 localhost vsfs-xxxxxxxxxxxxxx[1406]: [MainThread] Changing container state: container_init_succeeded -> container_start_succeeded
/scratch/log/vdfs_support/containers/fsvm_logs/journal:Jun 12 17:16:08 localhost vsfs-xxxxxxxxxxxxxx[1407]: [MainThread] Changing container state: container_init_succeeded -> container_start_succeeded
2024-06-12T17:15:49.560Z info vsand[6219358] [opID=facd59b4-W99-7a2c VsanFileServiceSystemImpl::CreateFileServiceDomain] Creating the domain on the RootFS ...
2024-06-11T18:08:45.440Z error vsand[2104920] [opID=W3301037-W3301038 VsanVimHelpers::GetVsanVersionNamespace] Failed to test vsan vmodl version with error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131) on 10.xx.xx.xx
openssl verify -purpose sslclient -CAfile /etc/vmware/ssl/castore.pem /etc/vmware/ssl/rui.crt
Example of healthy cert return:
[root@vsan-host:~] openssl verify -purpose sslclient -CAfile /etc/vmware/ssl/castore.pem /etc/vmware/ssl/rui.crt
/etc/vmware/ssl/rui.crt: OK
Example of an unhealthy cert return:
[root@vsan-host:~] openssl verify -purpose sslclient -CAfile /etc/vmware/ssl/castore.pem /etc/vmware/ssl/rui.crt
/etc/vmware/ssl/rui.crt: C = country, ST = state, L = location, O = O, OU = OU, CN = vsan-host
error 20 at 0 depth lookup:unable to get local issuer certificate
Regenerate vSphere 6.x, 7.x, and 8.0 certificates using self-signed VMCA
If using custom certs then the certs will need to be reissued to the hosts that did not return in OK state from the above openssl
command.
Note: If the hosts are using custom certs and are added to distributed switches then you may observe all triggered alerts on skyline health are pointing to thumbprint issue on the host. In this case, place the vSAN node into maintenance mode and then remove the host from inventory. Add the host back in the vSAN cluster and then readd the host to the distributed switch by following - Add Hosts to the vSphere Distributed Switch
Should the above solutions still not resolve the issue then the certificates will need further investigation.