Stats Primary election alert Triggered after vSAN cluster restart
search cancel

Stats Primary election alert Triggered after vSAN cluster restart

book

Article ID: 387843

calendar_today

Updated On:

Products

VMware vSAN 8.x

Issue/Introduction

Symptoms:

  • After successful vSAN Cluster shutdown and Restart the alert for Stats Primary Election is triggered.
  • Under this health check, the Stats Primary is not visible, but the CMMDS primary is visible
  • Stats object shows missing.
  • All the objects are healthy and all the VMs can be Powered On.

Issue Validation:

Performance Stats object ".vsan.stats" exists but shows as missing even though all its components are active and the object is healthy:

Object UUID: xxxxxxxx-2ed5-xxxx-a4c8-xxxxxxafc016
Version: 20
Health: healthy
Owner: xxxxxxxxesx006.xxxx.xxxxxxx.com
Size: 512.00 GB
Used: 0.19 GB
Used 4K Blocks: 0.00 GB
Policy:
stripeWidth: 1
cacheReservation: 0
proportionalCapacity: [0, 100]
hostFailuresToTolerate: 2
forceProvisioning: 0
spbmProfileId: xxxxxxxx-73b1-xxxx-b072-xxxxxx272e5
spbmProfileGenerationNumber: 1
storageType: Allflash
replicaPreference: Capacity
iopsLimit: 0
checksumDisabled: 0
CSN: 35
SCSN: 38
spbmProfileName: Management - Optimal Datastore Default Policy - RAID6  -----> Note: RAID5 or RAID6
Configuration:
Concatenation
RAID_1
Component: xxxxxxxx-540f-xxxx-9cc7-xxxxxxafc016
Component State: ACTIVE,Address Space(B): 273804165120 (255.00GB), Disk UUID: xxxxxxxx-1366-xxxx-6e13-xxxxxx8e4008, Disk Name: t10.NVMe_xxxxxxxxKYDMV_xxxxxxxxxxxxxxxx:2
Votes: 2,Capacity Used(B): 13877248 (0.01GB), Physical Capacity Used(B): 13877248 (0.01GB), Total 4K Blocks Used(B): 0 (0.00GB), Host Name: xxxxxxxxesx006.xxxx.xxxxxxx.com
Component: xxxxxxxx-540f-xxxx-9cc7-xxxxxxafc016
Component State: ACTIVE, Address Space(B): 205353123840 (191.25GB), Disk UUID: xxxxxxxx-6880-xxxx-0cb7-xxxxxx4dc3f4, Disk Name: t10.NVMe_xxxxxxxxKYDMV_xxxxxxxxxxxxxxxx:2
Votes: 1,Capacity Used(B): 1547904 (0.00GB), Physical Capacity Used(B): 8650752 (0.01GB), Total 4K Blocks Used(B): 0 (0.00GB), Host Name: xxxxxxxxesx006.xxxx.xxxxxxx.com
Type: vmnamespace
Path: /vmfs/volumes/vsan:xxxxxxxxxxxx78139-xxxxxxxx477ba664/.vsan.stats (Missing)
Group UUID: xxxxxxxx-2ed5-xxxx-a4c8-xxxxxxafc016
Directory Name: .vsan.stats

Environment

VMware vSAN 8.x

Cause

This is caused due to a race condition where vSAN attempts to reinitialize the .vsan.stats object while DOMPauseAllCCPs is still set to 1 as part of the Shutdown Cluster Wizard process before setting DOMPauseAllCCPs back to the default of 0 during cluster restart. This only happens when the object has either a RAID5 or RAID6 storage policy.

Resolution

Broadcom Engineering is aware of this issue and they are working on a fix due in a future release.

Workaround:

Current Workaround available is to Delete and Recreate the stats object via RVC:

  1. Log into RVC via vCenter SSH.
    1. Open vCenter SSH.
    2. Log into shell
    3. Run the command: rvc
    4. Login with administrator@<domain>@localhost
  2. cd into localhost > datacenter > computers > vSAN Cluster
  3. Then run the command to delete the vSAN performance object: vsan.perf.stats_object_delete .
  4. Confirm the .vsan.stats folder no longer exists via vCenter UI browsing the vSAN datastore
  5. Once deleted, proceed to create the vSAN performance object by using the command: vsan.perf.stats_object_create .

If the .vsan.stats folder still exists on the vSAN datastore follow the below steps.

  1. Run command vsan.perf.stats_object_info . via RVC
  2. Run /usr/lib/vmware/osfs/bin/objtool delete -u <object_uuid> -f against the object uuid from step 1
    Note: It's crucial to double triple check the correct uuid is used for this command cause once the object is deleted it's gone. 
  3. Then proceed to Step 5 above

Another option is set the vSAN Datastore storage policy to the vSAN Default Policy which is RAID1

If you need assistance with this process open a case with vSAN Support