vSphere HA Datastore Heartbeat Best Practices for vSAN Stretched Clusters during network maintenance
search cancel

vSphere HA Datastore Heartbeat Best Practices for vSAN Stretched Clusters during network maintenance

book

Article ID: 435184

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

In a VMware vSphere 8.0U3 vSAN Stretched Cluster environment, virtual machines (VMs) may unexpectedly power off and restart during scheduled network maintenance affecting the Inter-Site Link (ISL).

Upon investigation, the following conditions are observed:

  • vSphere HA registers a complete host failure rather than network isolation.
  • The configured external heartbeat datastore is located at a remote site and becomes unreachable during the inter-site link (ISL) maintenance.
  • The cluster executes the default isolation response, leading to VM downtime.

Environment

VMware vSAN 8.x
VMware vSAN 9.x

Cause

During network maintenance that impacts the inter-site link (ISL), the primary High Availability (HA) agent loses vSAN network heartbeats to the secondary site. Concurrently, the remote datastore configured for HA heartbeating becomes unreachable over the network. Lacking both network and datastore heartbeats, vSphere HA registers a host failure rather than network isolation and executes the Host Isolation Response.

Resolution

To prevent virtual machine isolation responses during scheduled network maintenance, execute the following procedures:

  1. Navigate to the vSphere HA cluster settings and disable Host Monitoring prior to executing maintenance that impacts the vSAN network or ISL.

  2. Place the affected ESXi hosts into Maintenance Mode using the "Ensure accessibility" or "Full data migration" evacuation mode if the maintenance requires taking individual hosts or site infrastructure offline.

  3. Verify that network routing across the ISL is restored upon completion of the maintenance.

  4. Re-enable Host Monitoring in the vSphere HA cluster settings.

  5. Disable datastore heartbeating in the HA interface by selecting "Use datastores only from the specified list" and deselecting all datastores to permanently resolve false-positive isolation events caused by remote heartbeat datastores.

  6. Set the advanced HA parameter das.ignoreInsufficientHbDatastore to true to suppress the insufficient heartbeat datastore warnings in vCenter Server.

Additional Information

Using vSphere HA with vSAN

Advanced configuration options for VMware High Availability in vSphere

vSphere HA heartbeat datastores, the isolation address and vSAN