VCF Deployment Fails: Enable default vSAN Storage Policies (Network Partition)

Products

VMware SDDC Manager / VCF Installer VMware vCenter Server VMware vSAN VMware vSphere ESXi VMware Cloud Foundation

Issue/Introduction

When deploying VMware Cloud Foundation (VCF), the bring-up process may fail during the Configure the vSphere cluster stage. Specifically, the task Enable the default vSAN Storage Policies fails with an "Invalid virtual machine configuration" error.

VCF Installer displays errors such as Failed to apply default vSAN policy.

In /var/log/vmware/vcf/domainmanager/domainmanager.log, you see entries indicating vob.vsan.clomd.needMoreDisks2.

YYYY=MM-DDTHH:MM:SS+0000 ERROR [vcf_dm,697f932a56f3###########696315cf,f1dc] [c.v.e.s.c.c.v.vsphere.VcManagerBase,dm-exec-25]  Task information for future track{"key":"task-###","task":{"_type":"Task","_value":"task-###","_serverGuid":"UUID_of_Virtual_Machine"},"description":{"key":"com.vmware.vim.vpxd.vpx.vmprov.ReconfigureVm","message":"Reconfiguring Virtual Machine on destination host"},"name":{"_wsdlName":"ReconfigVM_Task"},"descriptionId":"VirtualMachine.reconfigure","entity":{"_type":"VirtualMachine","_value":"vm-20","_serverGuid":"UUID_of_Virtual_Machine"},"entityName":"FQDN_OF_VCENTER","state":"error","cancelled":false,"cancelable":false,"error":{"property":"config.vmProfile","_msg":"Invalid virtual machine configuration.","_faultMsg":[{"key":"vob.vsan.clomd.needMoreDisks2","arg":[{"key":"1","value":"0"},{"key":"2","value":"1"},{"key":"3","value":"12"},{"key":"4","value":"0"},{"key":"5","value":"0"},{"key":"6","value":"0"},{"key":"7","value":"0"},{"key":"8","value":"12"},{"key":"9","value":"0"}],"message":"There are currently 0 usable disks for the operation. This operation requires 1 more usable disks. \nRemaining ## disks unusable because: \n 0 - Insufficient space for data/cache reservation. \n 0 - Maintenance mode or unhealthy disks. \n 0 - Disk-version or storage-type mismatch. \n 0 - Max component count reached. \n ## - In unusable fault-domains due to policy constraints.

The log states: Remaining disks unusable because: In unusable fault-domains due to policy constraints, same information can be found in vpxd log of vCenter Server located at (/var/log/vmware/vpxd/vpxd.log)

-->          message = "There are currently 0 usable disks for the operation. This operation requires 1 more usable disks.
--> Remaining ## disks unusable because:
-->  0 - Insufficient space for data/cache reservation.
-->  0 - Maintenance mode or unhealthy disks.
-->  0 - Disk-version or storage-type mismatch.
-->  0 - Max component count reached.
-->  ## - In unusable fault-domains due to policy constraints.
-->  0 - In witness node."
-->       }
-->    ],
-->    property = "config.vmProfile",
-->    msg = "Invalid virtual machine configuration."
--> }

Hosts appear healthy individually but are unable to communicate over the vSAN network.

Environment

VMware Cloud Foundation 9.x

Cause

This issue is caused by a Network Partition in the vSAN cluster.

The ESXi hosts are unable to communicate over the vSAN VMkernel interfaces, typically due to physical switch configuration issues, incorrect VLAN tagging, or MTU mismatches.

Because the hosts cannot "see" each other, vSAN cannot satisfy the storage policy requirements that mandate data redundancy across multiple nodes.

Resolution

To resolve this issue, we must must fix the underlying network connectivity between the ESXi hosts:

Check vSAN Cluster Status:
- Log in to each ESXi host using SSH client and run the command:
```
localcli vsan cluster get
```
- If the Sub-Cluster Member Count is less than the total number of hosts in vSAN Cluster this means the host is partitioned.
Test vSAN Connectivity:
- To diagnose network connectivity, please SSH into one of the affected ESXi hosts as root. Run the command below to automatically detect the vSAN interface and initiate a ping test against all other nodes in the cluster:
  
  vsanvmk=$(localcli vsan network list | grep -A 1 "vsan" -B 12 | grep -ie "vmknic name:" | tail -1 | awk '{print $3}' |tail +1); vmkmtu=$(localcli network ip interface list |grep "$vsanvmk" -A 13 |tail -1 |awk '{print $2}') ; mxmtu=$(expr $vmkmtu - 28); switchport=$(net-stats -l | grep "$vsanvmk" | awk '{print $1}') ; vswitch_name=$(net-stats -l |grep "$vsanvmk" | awk '{print $4}'); vmnic=$(vsish -e get /net/portsets/"$vswitch_name"/ports/"$switchport"/teamUplink); localcli vsan cluster unicastagent list |tail +3 | awk '{print $4}' |while read unicip; do echo The vSAN vmknic is $vsanvmk on $vswitch_name $vmnic with MTU configured as $vmkmtu vSAN neighbor IP is $unicip; vmkping -I $vsanvmk -s $mxmtu -d $unicip | tail -3 ;done
- The above script automates the checks described in the KB article: vSAN Health Service - Network Health - Hosts small ping/large ping test.
If the ping fails please check Physical Network Connectivity, ensure that:
- The vSAN VLAN is trunked to all relevant physical ports.
- The MTU is set to 9000 (if using Jumbo Frames) end-to-end on the physical switches.
Once networking is restored, re-run localcli vsan cluster get. Ensure the Member Count reflects the full number of hosts.
Return to the VCF Installer UI and click Retry to continue the deployment.