High CPU utilization in ESXi cluster after transitioning DRS to Manual mode
search cancel

High CPU utilization in ESXi cluster after transitioning DRS to Manual mode

book

Article ID: 437878

calendar_today

Updated On:

Products

VMware vCenter Server 8.0

Issue/Introduction

  • Host CPU usage across the cluster spikes to 90–100% immediately after setting Distributed Resource Scheduler (DRS) to Manual mode.
  • The high CPU utilization persists even after returning the DRS setting to Fully Automated.
  • Adding additional hosts to the cluster does not normalize the CPU usage.
  • Specific virtual machines (VMs) are pegged at 100% CPU usage and exhibit high CPU ready and wait times.

Environment

vCenter 8.0*

Cause

This issue occurs because resource-intensive virtual machines are unable to automatically migrate across hosts to balance the workload while DRS is in Manual mode. In environments where the DRS migration threshold is set to Aggressive, the sudden restriction of movement can lead to a resource backlog within individual VMs. This backlog causes sustained high utilization that may not resolve automatically when DRS is switched back to Fully Automated.

Resolution

  1. Identify Affected VMs: Use the vSphere Client or performance monitoring tools to identify specific virtual machines that are pegged at 100% CPU utilization and showing high wait or ready times.
  2. Reboot Guest OS: Reboot the affected virtual machines. This clears the resource backlog and runaway processes within the Guest OS that occurred during the manual mode period.
  3. Adjust DRS Threshold: Change the Cluster DRS migration threshold from Aggressive (Level 5) to Balanced (Level 3). This helps stabilize the cluster and prevents excessive migration overhead during the recovery process.
  4. Monitor Rebalancing: Once DRS is set to Fully Automated and the VMs are rebooted, allow the system time to rebalance the workload across all available hosts.