On a vSAN host, when using the python script to shut down the cluster manually, we get the below error:
[root@esxi:~] python /usr/lib/vmware/vsan/bin/reboot_helper.py prepare
Begin to prepare the cluster for gracefully rebooting...
Time among connected hosts are synchronized.
Scheduled vSAN reconfig task.
Waiting for the scheduled task...(27s left)
Checking network status...
Cluster preparation is not ready, retry after 10s...
Cluster preparation is not ready, retry after 10s...
Cluster preparation is not ready, retry after 10s...
Timeout, please try again later.
vSAN 8.x
The issue is seen when there are more than one vmkernel adapters enabled with vSAN network and removed later. However, the cmmds database
still has STALE details about the removed adapters.
Example: In this cluster, the vSAN traffic was enabled on vmk3. However, the 'esxcli vsan network list' command below reports the traffic on two different vmkernel adapter. Note: In this scenario, vmk2 does not exists on ESXi host network configuration.
Refer screen below: Where vmk2 does not exist on the ESXi hosts and it was deleted.
Hence, the script keeps on failing at the 'cluster preparation not ready' error.
As the stale vmkernel adapter does not show up in UI or command, it is not possible to uncheck the traffic.
esxcli vsan network clear
Warning: This is going to wipe the network configuration for vSAN hosts and can make the host to get into network partition.
[
root@esxi
] esxcli vsan network ipv4 add -i vmk1
[
root@esxi
] esxcli vsan network list
VmkNic Name: vmk1
IP Protocol: IP
Interface UUID:
Agent Group Multicast Address:
Agent Group IPv6 Multicast Address: ff19::
Agent Group Multicast Port: 23451
Master Group Multicast Address:
Master Group IPv6 Multicast Address: ff19::
Master Group Multicast Port: 12345
Host Unicast Channel Bound Port: 12321
Data-in-Transit Encryption Key Exchange Port: 0
Multicast TTL: 5
Traffic Type: vsan
ESXihost]# python /usr/lib/vmware/vsan/bin/reboot_helper.py prepare
Begin to prepare the cluster for gracefully rebooting ...
Time among connected hosts are synchronized.
scheduled VSAN reconfig task.
laiting for the scheduled task ... (6s left)
hecking network status ...
Network checking done.
Cluster preparation is done!
Ready to proceed following steps:
1.Put all hosts in maintenance mode with 'No Action' mode.
2.Reboot/Shutdown all hosts.
3.When all hosts are brought back, exit maintenace mode.
4.Run this script again with 'recover' command.
python /usr/lib/vmware/vsan/bin/reboot_helper.py recover