In a VMware vSphere 8.0U3 vSAN Stretched Cluster environment, virtual machines (VMs) may unexpectedly power off and restart during scheduled network maintenance affecting the Inter-Site Link (ISL).
Upon investigation, the following conditions are observed:
VMware vSAN 8.x
VMware vSAN 9.x
During network maintenance that impacts the inter-site link (ISL), the primary High Availability (HA) agent loses vSAN network heartbeats to the secondary site. Concurrently, the remote datastore configured for HA heartbeating becomes unreachable over the network. Lacking both network and datastore heartbeats, vSphere HA registers a host failure rather than network isolation and executes the Host Isolation Response.
To prevent virtual machine isolation responses during scheduled network maintenance, execute the following procedures:
Navigate to the vSphere HA cluster settings and disable Host Monitoring prior to executing maintenance that impacts the vSAN network or ISL.
Place the affected ESXi hosts into Maintenance Mode using the "Ensure accessibility" or "Full data migration" evacuation mode if the maintenance requires taking individual hosts or site infrastructure offline.
Verify that network routing across the ISL is restored upon completion of the maintenance.
Re-enable Host Monitoring in the vSphere HA cluster settings.
Disable datastore heartbeating in the HA interface by selecting "Use datastores only from the specified list" and deselecting all datastores to permanently resolve false-positive isolation events caused by remote heartbeat datastores.
Set the advanced HA parameter das.ignoreInsufficientHbDatastore to true to suppress the insufficient heartbeat datastore warnings in vCenter Server.