Stretch cluster operation failure with error "Failed to configure fault domains in cluster"
search cancel

Stretch cluster operation failure with error "Failed to configure fault domains in cluster"

book

Article ID: 313315

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

Symptoms:

Witness VM deployed with all SSD.

Some of these symptoms can occur (if not all):

Stretch workflow fails with below exception:


2023-01-06T06:49:24.390+0000 DEBUG [vcf_dm,a62ce57acb53eec5,e814] [c.v.e.s.c.c.v.vsan.VsanManagerBase,dm-exec-18]  VsanManagerBase: claimDisksForWitnessHost execution started.
2023-01-06T06:49:24.437+0000 DEBUG [vcf_dm,a62ce57acb53eec5,e814] [c.v.e.s.c.c.v.vsan.VsanManagerBase,dm-exec-18]  Is SSD disk : true
2023-01-06T06:49:24.437+0000 DEBUG [vcf_dm,a62ce57acb53eec5,e814] [c.v.e.s.c.c.v.vsan.VsanManagerBase,dm-exec-18]  Is SSD disk : true
2023-01-06T06:49:24.437+0000 DEBUG [vcf_dm,a62ce57acb53eec5,e814] [c.v.e.s.c.c.v.vsan.VsanManagerBase,dm-exec-18]  Is SSD disk : true
2023-01-06T06:49:24.437+0000 DEBUG [vcf_dm,a62ce57acb53eec5,e814] [c.v.e.s.c.c.v.vsan.VsanManagerBase,dm-exec-18]  Successfully claimed vSAN storage SSD disk Local VMware Disk (mpx.vmhba0:
C0:T2:L0) and NonSSD disk Local VMware Disk (mpx.vmhba0:C0:T0:L0) for witness
2023-01-06T06:49:24.445+0000 DEBUG [vcf_dm,a62ce57acb53eec5,e814] [c.v.e.s.c.c.v.v.InventoryService,dm-exec-18]  No more results to retrieve
2023-01-06T06:49:24.531+0000 ERROR [vcf_dm,a62ce57acb53eec5,e814] [c.v.e.s.c.c.v.vsan.VsanManagerBase,dm-exec-18]  Failed to configure Vsan fault domains for cluster mgmt-cluster01
com.vmware.vim.binding.vmodl.fault.InvalidArgument: null
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at java.lang.Class.newInstance(Class.java:442)
        at com.vmware.vim.vmomi.core.types.impl.ComplexTypeImpl.newInstance(ComplexTypeImpl.java:174)
        at com.vmware.vim.vmomi.core.types.impl.DefaultDataObjectFactory.newDataObject(DefaultDataObjectFactory.java:25)
        at com.vmware.vim.vmomi.core.soap.impl.unmarshaller.ComplexStackContext.<init>(ComplexStackContext.java:30)


Check the output of vdq -vq from witness VM and confirm all the virtual disks are of type SSD:

DiskResults:
DiskResult[0]:
Name: mpx.vmhba0:C0:T2:L0
VSANUUID:
State: Eligible for use by VSAN
Reason: None
IsSSD?: 1
IsCapacityFlash?: 0
IsPDL?: 0
Size(MB): 358400
FormatType: 512n
IsVsanDirectDisk?: 0


DiskResult[1]:
Name: mpx.vmhba0:C0:T1:L0
VSANUUID:
State: Eligible for use by VSAN
Reason: None
IsSSD?: 1
IsCapacityFlash?: 0
IsPDL?: 0
Size(MB): 10240
FormatType: 512n
IsVsanDirectDisk?: 0


DiskResult[2]:
Name: mpx.vmhba0:C0:T0:L0
VSANUUID:
State: Ineligible for use by VSAN
Reason: Has partitions
IsSSD?: 1
IsCapacityFlash?: 0
IsPDL?: 0
Size(MB): 12288
FormatType: 512n
IsVsanDirectDisk?: 0


Environment

VMware Cloud Foundation 4.4

Cause

While stretching the cluster, vCenter API convertToStretchedCluster() is called and in this case it has thrown an exception. API had thrown exception because DiskMapping argument to convertToStretchedCluster() API had a disk which was ineligible for vSAN .There was no check for vSAN eligibility of disk while creating DiskMapping group.This issue particularly happens when witness is deployed in a local SSD with minimal configuration and thus results in creation of witness Virtual Machine with all virtual disks of type SSD.

Resolution

To resolve the issue, convert all SSD to HDD apart from one which will be used as cache tier (Preferably smaller amongst all size).


Workaround:

To workaround the issue, please follow the below mentioned steps:

  1. Login to deployed witness VM  and type vdq -vq command.Check all the virtual disks are SSD.

DiskResults:
DiskResult[0]:
Name:  mpx.vmhba0:C0:T2:L0
VSANUUID:  
State:  Eligible for use by VSAN
Reason:  None
IsSSD?:  1
IsCapacityFlash?:  0
IsPDL?:  0
Size(MB):  358400
FormatType:  512n
IsVsanDirectDisk?:  0


DiskResult[1]:
Name:  mpx.vmhba0:C0:T1:L0
VSANUUID:  
State:  Eligible for use by VSAN
Reason:  None
IsSSD?:  1
IsCapacityFlash?:  0
IsPDL?:  0
Size(MB):  10240
FormatType:  512n
IsVsanDirectDisk?:  0


DiskResult[2]:
Name:  mpx.vmhba0:C0:T0:L0
VSANUUID:  
State:  Ineligible for use by VSAN
Reason:  Has partitions
IsSSD?:  1
IsCapacityFlash?:  0
IsPDL?:  0
Size(MB):  12288
FormatType:  512n
IsVsanDirectDisk?:  0


In the above example disk mpx.vmhba0:C0:T0:L0 (State:  Ineligible for use by VSAN) was part of  DiskMapping argument to  convertToStretchedCluster API which resulted in the exception.

Now to recover go to vCenter and select witness VM.
Configure -> Storage Devices -> select both
mpx.vmhba0:C0:T0:L0  [State:  Ineligible for use by VSAN]   
mpx.vmhba0:C0:T2:L0 [which is the bigger one Size(MB):  358400]
and Mark it as HDD Disk


This results in witness having 1 SSD(which is eligible for vSAN) and 2 or more HDD .

  1. Restart the failed stretch workflow.