vSAN Infrastructure health reports 'File server not found' or 'vSAN File service not enabled'
search cancel

vSAN Infrastructure health reports 'File server not found' or 'vSAN File service not enabled'

book

Article ID: 372631

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • The ESXi host is a major revisions behind the vCenter version
    • for example vCenter was upgraded to 8.x but the hosts are still on 7.x

  • Skyline Health shows following error:




  • The vSAN infrastructure health report indicates that the file server is not detected on a specific node within the cluster. 

  • Prior to this, there was a maintenance activity on the impacted node.

  • Remediating the vSAN file services fails with below error:

  • There could also be an instance where vSAN File services VM is not enabled on a host instead of the vSAN File service not being found.

    Recent tasks would report:

    Task Name :
    Remediate vSAN file service

    Status : 
    "Cannot complete the operation. See the event log for details. Unable to enable the vSAN file service."


  • When we try to restart the fsvmsockrelay service, it fails with the below error:

/etc/init.d/fsvmsockrelay restart
sockrelay is not running
No fsvm-sockrelay resource pool found
vSAN File Service Node cannot be found on this host
sockrelay is not running

  • Additionally "Install agent" tasks by EAM could be seen in vCenter and it fails with the error ' Unable to access agent OVF package file'

  • Upon reviewing the events in the "/var/log/vmware/eam/eam.log" file in vCenter, the following observations were noted

2024-07-15T09:59:50.564Z |  WARN | vlsi | Workflow.java | 156 | [OvfValidator->Validate:http://localhost:1080/external-tp/httpl/hostname.domain.com/443/e6714aaec7d3ffef1e34cd0c8e2621fe67410cff/vsanHealth/fileService/ovf/7.0.3.1000-20036589/V
Mware-vSAN-File-Services-Appliance-7.0.3.1000-20036589_OVF10.ovf:d6568d69a9a44e0b] NEXT WORK ITEM : Failed to instantiate
com.vmware.eam.exception.CannotAccessOVF: Cannot access OVF at http://localhost:1080/external-tp/http1/hostname.domain.com/443/e6714aaec7d3ffef1e34cd0c8e2621fe67410cff/vsanHealth/fileService/ovf/7.0.3.1000-20036589/VMware-vSAN-File-Services-Appliance-7.0.3.1000-20036589_OVF10.ovf
      at com.vmware.eam.agency.impl.OvfDownloader.downloadInternal(OvfDownloader.java:88) ~[eam-server.jar:?]
      at com.vmware.eam.agency.impl.OvfDownloader.download(OvfDownloader.java:65) ~[eam-server.jar:?]
      at com.vmware.eam.agency.impl.OVFs.toOvfInfo(OVFs.java:122) ~[eam-server.jar:?]
      at com.vmware.eam.agency.impl.OVFs.getInternal(OVFs.java:77) ~[eam-server.jar:?]
      at com.vmware.eam.agency.impl.OVFs.get(OVFs.java:66) ~[eam-server.jar:?]
      at com.vmware.eam.agency.impl.OvfValidator.lambda$validate$0(OvfValidator.java:81) ~[eam-server.jar:?]
      at com.vmware.eam.async.workflow.impl.CancellableWorkItemProvider.provide(CancellableWorkItemProvider.java:101) ~[eam-server.jar:?]

  • According to the logs found at "/var/log/vmware/vsan-health/vmware-vsan-health-service.log", it appears that the directory where the OVF files were previously located no longer exists

2024-07-15T09:59:50.548Z ERROR vsan-mgmt[12761] [VsanHttpProvider::doGet opID=noOpId] Looking for non-existing path /storage/vsan-health/../updatemgr/vsan/fileService/ovf-7.0.3.1000-20036589/VMware-vSAN-File-Services-Appliance-7.0.3.1000-20036589_OVF10.ovf, return 404

2024-07-15T09:59:50.549Z INFO vsan-mgmt[12761] [VsanMgmtServer::log_message opID=noOpId] ('127.0.0.1', 52490) - - "GET /vsanHealth/fileService/ovf/7.0.3.1000-20036589/VMware-vSAN-File-Services-Appliance-7.0.3.1000-20036589_OVF10.ovf HTTP/1.1" 404 -
2024-07-15T09:59:50.556Z INFO vsan-mgmt[12761] [VsanMgmtServer::log_message opID=noOpId] ('127.0.0.1', 52490) - - "HEAD /vsanHealth/fileService/ovf/7.0.3.1000-20036589/VMware-vSAN-File-Services-Appliance-7.0.3.1000-20036589_OVF10.ovf HTTP/1.1" 200 -
2024-07-15T09:59:50.563Z ERROR vsan-mgmt[12761] [VsanHttpProvider::doGet opID=noOpId] Looking for non-existing path /storage/vsan-health/../updatemgr/vsan/fileService/ovf-7.0.3.1000-20036589/VMware-vSAN-File-Services-Appliance-7.0.3.1000-20036589_OVF10.ovf, return 404
2024-07-15T09:59:50.563Z INFO vsan-mgmt[12761] [VsanMgmtServer::log_message opID=noOpId] ('127.0.0.1', 52490) - - "GET /vsanHealth/fileService/ovf/7.0.3.1000-20036589/VMware-vSAN-File-Services-Appliance-7.0.3.1000-20036589_OVF10.ovf HTTP/1.1" 404 -
2024-07-15T09:59:50.572Z ERROR vsan-mgmt[09194] [VsanClusterFileServiceSystemImpl::_RemediateClusterFileServiceTask opID=77504a72-W3314] Exception happened in deploying OVF in cluster 'vim.ClusterComputeResource:domain-c8'

 

Environment

VMware vSAN 7.x
VMware vSAN 8.x
VMware vSAN 9.x

Cause

vCenter has missing OVF and related files for vSAN File Services VMs in the designated path.

Note: This is usually seen where there is a major vCenter upgrade (say from vCenter 7.x to 8.x) where OVF files are not retained from older version.

 

Resolution

    1. Download the FSVM version OVF and the other files for vSAN File service again from the Broadcom portal.
      (My Downloads > Enter 'vSphere' in 'Search Product Name' field > Click on vSAN > Click on the the appropriate vSphere/ESXi version > Go to 'Drivers & Tools' tab > Select the appropriate vSAN File Service VM OVF is required )
      There should be a total of 6 files:
      VMware-vSAN-File-Services-Appliance-x.x.x.x-x_OVF10.mf
      VMware-vSAN-File-Services-Appliance-x.x.x.x-x-x_OVF10.cert
      VMware-vSAN-File-Services-Appliance-x.x.x.x-x-x-system.vmdk
      VMware-vSAN-File-Services-Appliance-x.x.x.x-x-cloud-components.vmdk
      VMware-vSAN-File-Services-Appliance-x.x.x.x-x-log.vmdk
      VMware-vSAN-File-Services-Appliance-x.x.x.x-x_OVF10.ovf


    2. Create the following path: 

      /storage/updatemgr/vsan/fileService/ovf-x.y.z.aaaa-bbbbbbb

      (where ovf-x.y.z.aaaa-bbbbbbb is the version found from /var/log/vmware/vsan-health/vmware-vsan-health-service.log
      Alternatively, you can find the version in vSphere Client from vSAN Cluster > Configure > Services > File Service
      )

    3. Transfer the downloaded files to the newly created path using WinSCP or any file transfer utility.

    4. Return to vCenter SSH, navigate to path /storage/updatemgr/vsan and modify the User and Group owner for the directory 'fileService' and directories under it:
      Command to run :

      chown -R vsan-health:users fileService

    5. Run following commands to change permissions on these directories.

      chmod 755 /storage/updatemgr/vsan/fileService
      chmod 644 /storage/updatemgr/vsan/fileService/ovf-x.y.z.aaaa-bbbbbbb
    6. Proceed to remediate the file service from Skyline Health under 'Infrastructure Health' option. This action will enable the EAM agent to deploy the FSVM.
      (vSAN Cluster > Monitor > vSAN Skyline Health > Infrastructure Health > Troubleshoot)

    7. Re-run 'Retest' vSAN Skyline Health from vSphere Client
      (vSAN Cluster > Monitor > vSAN Skyline Health)

Additional Information

Found that we had to change permissions of sub directory to 755 also

chmod 755 /storage/updatemgr/vsan/fileService/ovf-x.y.z.aaaa-bbbbbbb