ESXi hosts disconnect from vCenter due to Resource Contention Stun during Snapshot Consolidation and vSphere Replication (HBR)
search cancel

ESXi hosts disconnect from vCenter due to Resource Contention Stun during Snapshot Consolidation and vSphere Replication (HBR)

book

Article ID: 434327

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

 

  • All ESXi hosts in a cluster simultaneously show as Disconnected or Not Responding in vCenter.
  • vCenter Server Appliance (VCSA) becomes momentarily unresponsive to management traffic.
  • Virtual machine workloads continue to run without interruption.
  • vCenter vpxd.log shows massive thread delays exceeding the 60-second heartbeat threshold: InvokeWithOpId [TotalTime] took 122148 ms
  • ESXi vmkernel.log shows simultaneous storage-intensive operations on the vCenter VM's disks: FDS: 642: Enabling IO coalescing on driver 'deltadisks' SVM: 5389: SVM_MakeDev.5389: Creating device ...-consolidate: Success

Environment

ESXi 8.0x

Cause

The root cause is a Resource Contention Stun on the vCenter Server VM's virtual disks. This occurs when high-overhead storage tasks overlap, such as a vSphere Replication (HBR) cycle attempting I/O coalescing while a Snapshot Consolidation task is already active.

The simultaneous metadata reconciliation requirements on the storage sub-system force the hypervisor to pause (stun) the vCenter process to maintain data integrity. If this stun exceeds 60 seconds, the vpxd service cannot respond to host heartbeats, resulting in a cluster-wide disconnection Virtual Machine Unresponsive.

Resolution

To stabilize the management plane and prevent future stuns, execute the following remediation steps:

1. Manual Snapshot Remediation

Perform a Delete All or Consolidate task on the vCenter Server VM during a scheduled maintenance window to clear orphaned delta disks and large storage backlogs Snapshot Consolidation Guide.

2. Stagger Operational Schedules

Reschedule image-based backups and high-frequency vSphere Replication (HBR) RPO cycles to ensure they do not overlap. Decoupling these windows eliminates the metadata conflict between snapshot management and I/O coalescing Replication Slowness Troubleshooting.

3. Tiered Storage Migration

Relocate the vCenter Server VM to a high-performance (SSD/NVMe) datastore. Higher IOPS and lower latency reduce the duration of VM stuns during future metadata reconciliations Throughput Drops Backup.

Additional Information

Related Articles: