After a vCenter upgrade, vSphere HA fails with an error Device or resource busy
search cancel

After a vCenter upgrade, vSphere HA fails with an error Device or resource busy

book

Article ID: 322298

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
After a cluster upgrade, when vCenter restarts, in a short time most of the VMs might be migrated to a small group of the ESXi hosts in the cluster, which leads to a performance downgrade. In the FDM Install error logs on the ESXi hosts, you see a warning such as:

rm: can't remove '/tardisks/vmware_f.v00': Device or resource busy

Cause

After a cluster upgrade, while ESXi hosts reconnect to the vpxd service, the Fault Domain Manager (FDM) agent needs some time to become fully operational. If during this time the vSphere DRS component that scans the VM-to-host compatibility finds the FDM agent does not work on a given ESXi host, DRS forces the migration of the VMs to another host for high availability purposes.

Resolution

This has been resolved in vCenter Server 7.0 U3l, with the addition of a user-configurable option whereby DRS will wait for a grace period of n seconds before initiating moves.  This is configured with the advanced option below:

CompatCheckTransientFailureTimeSeconds

This advanced value will accept a value in seconds for how long to wait before migrating after a transient compat check failure.  The recommendation is to use a value of 600 for 10 minutes.

Workaround:
A workaround for versions prior to 7.0 U3l is to place DRS in Manual mode during the upgrade process and then set it back to Fully Automated once the upgrade activities have completed successfully.