Following a planned power maintenance or cold start of a VMware Cloud Foundation (VCF) environment, the vSAN cluster in a Tanzu Workload Domain may fail to initialize.
python /usr/lib/vmware/vsan/bin/reboot_helper.py recover returns the error: Timeout, please try again later.esxcli vsan cluster get showing a Sub-Cluster Member Count: 1 on all nodes despite network connectivity.localcli vsan network list returns no output, indicating the vSAN traffic tag is missing from the VMkernel adapters.VMware vSAN 8.x
The vSAN traffic tag was lost or failed to persist on the designated VMkernel adapter (e.g., vmk3) during the host reboot cycle. Without the active vSAN traffic type enabled on the interface, the ESXi hosts cannot participate in the vSAN transport layer, preventing the cluster from forming a single partition.
To resolve this issue, the vSAN network configuration must be manually re-asserted on each host to restore the traffic tags before re-running the recovery script.
Confirm which VMkernel is intended for vSAN traffic (commonly vmk3 in VCF Tanzu domains) by checking the IP assignments: esxcfg-vmknic -l
Run the following command on each host. If the output is blank, the tag is missing: localcli vsan network list
On each host in the cluster, clear the vSAN network stack and re-add the specific VMkernel with the vSAN tag:esxcli vsan network clearesxcli vsan network ipv4 add -i vmk# (Note: Replace vmk# with the appropriate interface identified in Step 1 if different.)
vsan: esxcli vsan network list
python /usr/lib/vmware/vsan/bin/reboot_helper.py recover
Example output:
Begin to recover the cluster ...
The cluster has been recovered successfully.
Successfully resumed the cluster.
esxcli vsan debug object health summary get