vSAN -- Health Service - Cluster and host component utilization
search cancel

vSAN -- Health Service - Cluster and host component utilization

book

Article ID: 382735

calendar_today

Updated On:

Products

VMware vSAN VMware vSAN 8.x

Issue/Introduction

On a vSAN Stretched Cluster or Two-Node Stretched the following Symptoms are observed:

 
  • Attempting to power on any VM, fails with the error: "Module VMMon power on failed"

Sample Output:

 

  • Attempting to move a VM and/or create a VM/vmdk fails with the following error: "Cannot complete file creation operation..... Maximum number of supported components reached...."
Sample Output: 

 

 

  • vSAN Skyline Health is reporting errors in relation to Component Utilization on the Witness Host:

( vCenter Web Client --> Cluster --> Monitor --> vSAN --> Skyline Health )

Sample Output: "Cluster and host component utilization"

 

Sample Output: Physical Disk Component Utilization on the Witness Host is reaching 100%

 

  • In a 2-Node Stretched Cluster: The Witness Host might report "Some clusters have reached the witness components limit"

( vCenter Web Client --> Witness Node --> Monitor --> vSAN --> Two Node Clusters )

Sample Output: 

 

Environment

VMware vSAN 8.x

Cause

This is a known issue with the Witness component leak.

Resolution

Upgrade the vSphere environment to 8.0 P05 (ESXi 8.0 Update 3e - Build 24674464) or higher -- which is having the fix for Witness Component issue.

Until Environment has been upgraded: Workaround to reduce the number of Witness Components:

 

1.) Download script "publishDCEntry.py" (attached to this KB)
2.) On all non-Witness Hosts in the affected Cluster: Upload "publishDCEntry.py" to the OS-Data Partition
 
3.) On all non-Witness Hosts in the affected Cluster: Initialize the Script by running:
nohup python /vmfs/volumes/OSDATA-<UUID>/publishDCEntry.py --interval 900 --maxComponentsPerIter 2000 >/dev/null 2>&1 &
Note: The script only runs from CMMDS Master/Leader Node even though it is initialized on all Hosts.
 
4.) On all non-Witness Hosts in the affected Cluster: Verify that the following two files are created in the  /tmp/ directory
LogPath = "/tmp/DCpublisher.log"
StaleCompPath = "/tmp/stale_comp.txt"
 
 
5.) On all non-Witness Hosts in the affected Cluster: Ensure the Script is running in the background by looking in one of the files to check for updates
Example: tail -f /tmp/DCpublisher.log"
 
 
6.) To ensure that Script will still initialize after any of the non-Witness Hosts is rebooted prior having the fix installed (via upgrading to 8.0 P05):
Modify "/etc/rc.local.d/local.sh" to start during boot of non-Witness Hosts by adding the orange marked line above line "exit 0" as shown in the sample below:
 
[Host-01~] cat /etc/rc.local.d/local.sh
#!/bin/sh ++group=host/vim/vmvisor/boot
# local configuration options
# Note: modify at your own risk! If you do/use anything in this
# script that is not part of a stable API (relying on files to be in
# specific places, specific tools, specific output, etc) there is a
# possibility you will end up with a broken system after patching or
# upgrading. Changes are not supported unless under direction of
# VMware support.
# Note: This script will not be run when UEFI secure boot is enabled.
 nohup python /vmfs/volumes/OSDATA-<UUID>/publishDCEntry.py --interval 900 --maxComponentsPerIter 2000 >/dev/null 2>&1 &
 
 exit 0
[Host-01~] 

 

Notes:
- local.sh will only work if Secure Boot is disabled on the non-Witness Hosts in the affected Cluster
- If non-Witness Hosts are rebooted and have Secure Boot enabled: Step 3.) needs to be executed again after reboot. See KB 397125 for details 

 

 

Additional Information

Attachments

publishDCEntry.py get_app