Frequent vSAN Storage Policy compliance fluctuations for VMDKs shared on Windows Failover Clustering Services (WFCS)
search cancel

Frequent vSAN Storage Policy compliance fluctuations for VMDKs shared on Windows Failover Clustering Services (WFCS)

book

Article ID: 424117

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms 

  • The vCenter Server UI reports the Storage Policy status as "Compliant".
  • However, vROPS reports frequent status fluctuations from "Compliant" to "Out of Date".
  • This behavior is specifically observed only on the VMDKs shared across multiple virtual machines within a Windows Failover Clustering Service (WFCS) environment

Environment

  • VMware vSAN 8.x
  • VMware vSAN 9.x

Cause

This issue occurs when a Storage Policy is modified or updated on a shared VMDK from the perspective of one VM, but the change is not synchronized or re-applied to the same VMDK on the other participant VMs in the WFCS cluster.

In a shared disk configuration, the Policy ID and Generation Number must be consistent across all VMs sharing that specific object. 

Cause validation:

  • Identify the policy in vCenter: Navigate to the affected host > VM > Edit VM policy. Note down the policy applied for the affected VM. 
    Example : 
    Policy shown on vCenter "ABCD".


  • Verify the expected policy id and Generation Number for the policy "ABCD" via CLI:  Log in to the ESXi host via SSH and run the command "esxcli vsan debug object list":

    Example : 
    esxcli vsan debug object list

    structtype:
      ObjectInfo
    Health:
      healthy
    Object UUID:
      ########-####-####-####-############
    Version:
      20
    Owner:
      esxi-####.####.####-####.####.####
    Policy:
      stripeWidth: 1
          cacheReservation: 0
          proportionalCapacity: 0
          hostFailuresToTolerate: 0
          affinity: ['########-####-####-####-############']
          forceProvisioning: 0
          affinityMandatory: 1
          spbmProfileId: ########-####-####-####-ABCD
          spbmProfileGenerationNumber: 3
          replicaPreference: Performance
          iopsLimit: 0
          checksumDisabled: 0
          subFailuresToTolerate: 1
          CSN: 461
          SCSN: 13
          spbmProfileName: ABCD
          locality: NonPreferred

    Used:
      25165824
    Used 4K Blocks:
      26173440
    Size:
      549755813888
    Type:
      vdisk
    Path:
      /vmfs/volumes/vsan:################-################/########-####-####-####-############/VM.vmdk (Exists)

  • Verify the expected policy id and Generation Number for the affected VMDK via CLI: Log in to the ESXi host via SSH and run the command "esxcli vsan debug object list":

    esxcli vsan debug object list
    structtype:
      ObjectInfo
    Health:
      healthy
    Object UUID:
      ########-####-####-####-############
    Version:
      20
    Owner:
      esxi-####.####.####-####.####.####
    Policy:
      stripeWidth: 1
          cacheReservation: 0
          proportionalCapacity: 0
          hostFailuresToTolerate: 0
          affinity: ['########-####-####-####-############']
          forceProvisioning: 0
          affinityMandatory: 1
          spbmProfileId: ########-####-####-####-EFGH
          spbmProfileGenerationNumber: 2
          replicaPreference: Performance
          iopsLimit: 0
          checksumDisabled: 0
          subFailuresToTolerate: 1
          CSN: 461
          SCSN: 13
          spbmProfileName: EFGH
          locality: NonPreferred

    Used:
      25165824
    Used 4K Blocks:
      26173440
    Size:
      549755813888
    Type:
      vdisk
    Path:
      /vmfs/volumes/vsan:################-################/########-####-####-####-############/VM.vmdk (Exists)
  • Identify the Mismatch: Compare the spbmProfileId and spbmProfileGenerationNumber for the spbmProfileName.

    Example of an inconsistent state:

    • Expected Policy ID: ########-####-####-####-ABCD | Generation: 3
    • Received Policy ID: ########-####-####-####-EFGH | Generation: 2

  • The var/run/log/vsansystem.log file reports "Policy out of date" due to Policy ID and Generation ID mismatch.
    YYYY-MM-DDTHH:MM.SSSZ Er(163) vsansystem[30614378]: [vSAN@6876 sub=Default opId=d38ac082-1d97-4c3e-ae11-38a667b66848-5d9f-c030] Policy out of date detected. Expected policy id: ########-####-####-####-ABCD, Expected generation id: 3, Received policy id: ########-####-####-####-EFGH: 2.

Resolution

Re-apply the policy on shared VMDKs from all the VMs on which the disks are shared to ensure consistency in Policy ID and Generation ID. 

Note: Re-apply policy can trigger resynch. Therefore it is recommended to re-apply the policy during off business hours.