vSAN Skyline Health Warning: ‘Memory pool (heaps)’ Reported for Physical Disk”
search cancel

vSAN Skyline Health Warning: ‘Memory pool (heaps)’ Reported for Physical Disk”

book

Article ID: 416696

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • vSAN skyline health (vSAN Cluster > Monitor > vSAN > Skyline Health) reported below error.

 

 

  • In vSAN, each disk group and its associated physical disks are managed by the vSAN Disk Management Layer (DOM and LSOM) within the ESXi host. These components allocate memory pools (also known as heaps) to handle I/O metadata, caching, and object management for each disk.

        The “Physical Disk Health – Memory pools (heaps)” check in vSAN Skyline Health monitors the health of these memory pools used by vSAN to manage physical disks.

        When this health check shows a warning or error, it typically means that:

    • vSAN failed to allocate or initialize one or more memory heaps associated with the physical disk, or

    • The memory pools might have become exhausted or are not reporting correctly due to a software or hardware issue.

 

  • NVMe drives are being utilized as HDD capacity devices within the vSAN OSA disk group (vSAN cluster > Configure > Disk Management > Select ESXi host from Cluster> View disk).

 

 

  • Increase congestion metrics. Log in to the ESXi host through ssh, run the below script to check the memory congestion.

 

while true; do echo "================================================"; date; for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do echo $ssd;vsish -e get /vmkModules/lsom/disks/$ssd/info|grep Congestion;done; for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do llogTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "Log space consumed by LLOG"|awk -F \: '{print $2}');plogTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "Log space consumed by PLOG"|awk -F \: '{print $2}');llogGib=$(echo $llogTotal |awk '{print $1 / 1073741824}');plogGib=$(echo $plogTotal |awk '{print $1 / 1073741824}');allGibTotal=$(expr $llogTotal \+ $plogTotal|awk '{print $1 / 1073741824}');echo $ssd;echo " LLOG consumption: $llogGib";echo " PLOG consumption: $plogGib";echo " Total log consumption: $allGibTotal";done;sleep 30; done

  • Performance degradation on the affected disk group — slower resyncs, component rebuilds, or object access delays. 

Environment

VMware vSAN 8.x

Cause

  • Misclassifying an NVMe as an HDD causes vSAN to manage I/O inefficiently for a high-speed device, leading to memory and performance instability.
  • NVMe drives have far higher queue depths and IOPS than HDDs.
  • vSAN’s OSA was designed assuming capacity devices (HDDs/SSDs) have relatively lower performance.
  • When a very fast NVMe device is used as a capacity tier, vSAN must maintain a large number of metadata objects and outstanding I/O operations, which drives up:
    1. Memory pool heap exhaustion
    2. Congestion thresholds hit
    3. LSOM memory slab growth beyond limits

Resolution

--- Enter the ESXi host in Maintenance Mode.

--- Remove the disk group.

--- Mark the HDD disk as Flash disk for capacity tier (vSAN cluster > Configure > Disk Management > Select ESxi host> View disk> Ineligible and unclaimed disks> select HDD disk > Mark as Flash).

--- Create disk group (vSAN cluster > Configure > Disk Management > Select ESXi host> View disk> Create Disk Group).