vSAN -- Health Service - Cluster and host component utilization
search cancel

vSAN -- Health Service - Cluster and host component utilization

book

Article ID: 382735

calendar_today

Updated On:

Products

VMware vSAN VMware vSAN 8.x

Issue/Introduction

Symptoms 

  • Manual snapshot creation task fails with error: "The operation requires usable witness site. Witness site has reached max components limit of 21792. An error occurred while taking a snapshot: 22 (Invalid argument)." 

    Steps: VM > Monitor > Tasks



  • Attempting to power on any VM, fails with the error: "Module VMMon power on failed"


  • vSAN Skyline Health is reporting errors in relation to Component Utilization on the Witness Host:

Steps: vCenter Web Client > vSAN Cluster > Monitor > Under, vSAN select Skyline Health 



 

  • Physical Data Component Utilization on the Witness Host is reaching 100%.

 

  • In a 2-Node Stretched Cluster: The Witness Host might report "Some clusters have reached the witness components limit"

Steps: vCenter Web Client > Witness Node > Monitor > vSAN > Two Node Clusters )

 

Environment

VMware vSphere vSAN 8.x

Cause

This is a known issue with the Witness component leak.

Resolution

Upgrade the vSphere environment to 8.0 P05 (ESXi 8.0 Update 3e - Build 24674464) or higher -- which is having the fix for Witness Component issue.

Until Environment has been upgraded: Workaround to reduce the number of Witness Components:

 

1.) Download script "publishDCEntry.py" (attached to this KB)
2.) On all non-Witness Hosts in the affected Cluster: Upload "publishDCEntry.py" to the OS-Data Partition
 
3.) On all non-Witness Hosts in the affected Cluster: Initialize the Script by running:
nohup python /vmfs/volumes/OSDATA-<UUID>/publishDCEntry.py --interval 900 --maxComponentsPerIter 2000 >/dev/null 2>&1 &
Note: The script only runs from CMMDS Master/Leader Node even though it is initialized on all Hosts.
 
4.) On all non-Witness Hosts in the affected Cluster: Verify that the following two files are created in the  /tmp/ directory
LogPath = "/tmp/DCpublisher.log"
StaleCompPath = "/tmp/stale_comp.txt"
 
 
5.) On all non-Witness Hosts in the affected Cluster: Ensure the Script is running in the background by looking in one of the files to check for updates
Example: tail -f /tmp/DCpublisher.log"
 
 
6.) To ensure that Script will still initialize after any of the non-Witness Hosts is rebooted prior having the fix installed (via upgrading to 8.0 P05):
Modify "/etc/rc.local.d/local.sh" to start during boot of non-Witness Hosts by adding the orange marked line above line "exit 0" as shown in the sample below:
 
[Host-01~] cat /etc/rc.local.d/local.sh
#!/bin/sh ++group=host/vim/vmvisor/boot
# local configuration options
# Note: modify at your own risk! If you do/use anything in this
# script that is not part of a stable API (relying on files to be in
# specific places, specific tools, specific output, etc) there is a
# possibility you will end up with a broken system after patching or
# upgrading. Changes are not supported unless under direction of
# VMware support.
# Note: This script will not be run when UEFI secure boot is enabled.
 
 nohup python /vmfs/volumes/OSDATA-<UUID>/publishDCEntry.py --interval 900 --maxComponentsPerIter 2000 >/dev/null 2>&1 &
 
 exit 0
[Host-01~] 

 

Notes:
local.sh will only work if Secure Boot is disabled on the non-Witness Hosts in the affected Cluster
- If non-Witness Hosts are rebooted and have Secure Boot enabled: Step 3.) needs to be executed again after reboot. See KB 397125 for details 

Additional Information

Attachments

publishDCEntry.py get_app