ESXi host becomes unresponsive due to large resource pool memory reservation
search cancel

ESXi host becomes unresponsive due to large resource pool memory reservation

book

Article ID: 340296

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Symptoms:
On ESXi hosts managed by vCenter 6.7.x, if a resource pool has a very large memory reservation, in some cases hosts might run out of reservable memory leading to:
  • Single or multiple hosts becoming unresponsive
  • Various host services crashing with core dumps
  • A host might become unmanageable from vCenter Server
  • Various log files containing “Out of memory” or “InsufficientMemoryResourcesFault” type of events


Environment

VMware vSphere ESXi 6.7
VMware vCenter Server 6.7.x

Cause

The old DRS algorithm, prior to the vCenter 6.7.0 release, did not reserve more resources than the current VM demand in the resource pool, even if the resource pool is configured with a higher reservation. If there is a spike in VM resource demand, DRS would react only after the next algorithm run. Before that periodic run, VMs might suffer from temporary performance issues even if the resource pool has configured enough reservation.

In vCenter 6.7.x, DRS uses a two-pass algorithm to allocate a resource pool’s reservation to its child VMs. In the first pass, the resource pool reservation is distributed and capped at the VM’s demand, subject to each VM’s fair share. In the second pass, excess reservation is distributed proportionally, capped at the VM’s configured size. Therefore, resource pool reservation is aggressively allocated to its children, giving more buffer to sudden VM demand spikes. For more information on vCenter 6.7.x DRS change, see: DRS Enhancements in vSphere 6.7.

If a resource pool has a very high resource reservation value, there is a chance that it may reserve most host memory, ESXi kernel or agents may not be able to allocate new memory. This causes the above symptoms on the ESXi host. This is a known issue documented in: KB 52500. As mentioned previously, the 2-pass DRS algorithm adds more memory stress to ESXi if a resource pool uses a large memory reservation. Thus, the above symptoms could be more apparent after upgrading vCenter to a 6.7.x release.

Resolution

This is a known issue affecting vCenter Server 6.7

Note: This is not an issue in vCenter 7.0

Workaround:
To workaround this issue:
  1. Reduce the resource pool reservation.
  2. Disable vCenter 6.7.x 2-pass memory reservation algorithm through advanced option in vCenter:
    1. Log in to vCenter  with administrative privilages.
    2. Select the cluster object in inventory.
    3. Select configure tab > vSphere DRS.
    4. Click on Edit > Expand Advanced Options > Add.
    5. Under option, enter "CapRpMemReservationAtDemand" with a value of “1.
Note: 
Optionally, another advanced option "CapRpCpuReservationAtDemand " could be set to “1” as well to disable the new 2-pass algorithm for CPU resource. Service restart is not required as the added advanced options would take effect automatically at the next DRS algorithm run.