Remediating vSAN file service failures after a major version of vCenter/ESXi upgrade
search cancel

Remediating vSAN file service failures after a major version of vCenter/ESXi upgrade

book

Article ID: 406665

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

The issue occurs after performing a major version upgrade (e.g., from 7.0 to 8.0), starting with the vCenter Server, followed by a manual upgrade of the first ESXi host. At this point, all other hosts in the vSAN cluster are still running the older version, while only one host has been upgraded to the new major version.

Immediately after upgrading this first ESXi host, the 'Recent Tasks' pane in the vCenter vSphere Client displays an error stating

FSVM does not exist from the beginning

Further investigation confirms that the File Service Virtual Machine (FSVM) is missing on the upgraded host.

Since this problem affects only the FSVM on a single upgraded host, while the remaining FSVMs on the other (non-upgraded) hosts continue to function normally, vSAN file services remain operational and there is no immediate business impact.

Environment

vSphere vCenter 7.x
vSphere vCenter 8.x
vSphere ESXi 7.x
vSphere ESXi 8.x

Cause

From the log /var/log/vmware/vsan-health/vmware-vsan-health-service.log, such error message is observed,

YYYY-MM-DDTHH:MM:ss.###Z ERROR vsan-mgmt[#####] [VsanHttpProvider::doGet opID=noOpId] Looking for non-existing path /storage/vsan-health/../updatemgr/vsan/fileService/ovf-#######################/VMware-vSAN-File-Services-Appliance-#######################_OVF10.ovf, return 404
YYYY-MM-DDTHH:MM:ss.###Z INFO vsan-mgmt[#####] [VsanMgmtServer::log_message opID=noOpId] ('127.0.0.1', #####) - - "GET /vsanHealth/fileService/ovf/#######################/VMware-vSAN-File-Services-Appliance-#######################_OVF10.ovf HTTP/1.1" 404 -

This indicates that during the ESXi host upgrade process, the corresponding FSVM was deleted. After the upgrade, the vCenter attempted to re-deploy the FSVM on the upgraded host. However, because the newly upgraded vCenter no longer contains the required FSVM OVF package (for the previous version), the re-deployment failed with a 404 error.

As a result, the FSVM on the upgraded host is missing and cannot be automatically re-deployed.

 

Resolution

To resolve the issue and allow the vCenter to re-deploy the missing FSVM, follow the steps below:

1. Identify the vSAN file service version from above error message (e.g., 20036589, this is 7.0 U3f), and download the corresponding FSVM ovf package (usually consists of six files) from the Broadcom support site. Make sure the version exactly matches the one shown in the log; using a different version may result in re-deployment failure.

2. On the new vCenter, create the following directory (the name is retrieved from above error message),

/storage/vsan-health/../updatemgr/vsan/fileService/ovf-#######################/

NOTE: This path resolves to /storage/updatemgr/vsan/fileService/ovf-#######################/ since the .. moves one level up from vsan-health.

3. Use SCP or SFTP to upload all the downloaded FSVM ovf files (typically six files) into the newly created directory.

4. If everything is placed correctly, vCenter will automatically trigger the re-deployment of the missing FSVM on the upgraded ESXi host.

5. Once the FSVM is successfully re-deployed on the first upgraded host, you can proceed to upgrade the remaining ESXi hosts in the cluster.