Symptoms:
- vSAN stretched clusters
- Witness traffic separation
- The cluster is running either ESXi 7.0U3f - U3n or 8.0 - 8.0c.
- One host remains powered on while all other hosts have been correctly shut down
- Eventually, the automated cluster shutdown process in vCenter fails with an error message "Wait other hosts disconnected timeout <IP address of the orchestration host>".
Note: If the user manually shuts down the orchestration host before the aforementioned error message occurs, the shutdown process will fail with "Operation timed out" after the orchestration host is powered off.
- Running the following commands you can get the network configuration for the orchestration host
esxcli vsan network list
Interface:
VmkNic Name: vmk1
IP Protocol: IP
Interface UUID: 52e11b48-d3f7-37c9-ee3e-eb5e59e045b5
Agent Group Multicast Address: 224.2.3.4
Agent Group IPv6 Multicast Address: ff19::2:3:4
Agent Group Multicast Port: 23451
Master Group Multicast Address: 224.1.2.3
Master Group IPv6 Multicast Address: ff19::1:2:3
Master Group Multicast Port: 12345
Host Unicast Channel Bound Port: 12321
Data-in-Transit Encryption Key Exchange Port: 0
Multicast TTL: 5
Traffic Type: vsan
Interface:
VmkNic Name: vmk0
IP Protocol: IP
Interface UUID: 52d743e8-6f04-85e8-70eb-cc22df14da5f
Agent Group Multicast Address: 224.2.3.4
Agent Group IPv6 Multicast Address: ff19::2:3:4
Agent Group Multicast Port: 23451
Master Group Multicast Address: 224.1.2.3
Master Group IPv6 Multicast Address: ff19::1:2:3
Master Group Multicast Port: 12345
Host Unicast Channel Bound Port: 12321
Data-in-Transit Encryption Key Exchange Port: 0
Multicast TTL: 5
Traffic Type: witness
esxcfg-vmknic -l
Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
vmk0 8 IPv4 192.xxx.x.181 255.255.255.0 192.xxx.x.255 b4:7a:f1:82:25:4c 1500 65535 true STATIC defaultTcpipStack
vmk1 16 IPv4 19.xx.x.181 255.255.255.0 19.xx.x.255 00:50:56:60:20:03 1500 65535 true STATIC defaultTcpipStack
or
esxcli network ip interface ipv4 get
Name IPv4 Address IPv4 Netmask IPv4 Broadcast Address Type Gateway DHCP DNS
---- ------------- ------------- -------------- ------------ ----------- --------
vmk0 192.xxx.x.181 255.255.255.0 192.168.3.255 STATIC 192.168.3.1 false
vmk1 19.xx.x.181 255.255.255.0 19.16.3.255 STATIC 192.168.3.1 false
- When reviewing the /var/run/log/vsanmgmt.log at the time of the automated shutdown process, the below pattern can be observed:
2022-10-07T08:36:53.719Z info vsand[2104476] [opID=089825e4-09a5 VsanRebootUtil::GetLocalHostName] Get localHostname 19.xx.x.181
2022-10-07T08:40:24.223Z info vsand[2104476] [opID=089825e4-09a5 VsanClusterPowerSystemImpl::PerformOrchestrationClusterPowerAction] Waiting other host power off. Connected host found ['19.xx.x.182'], numLoop 1
2022-10-07T08:42:54.259Z info vsand[2104476] [opID=089825e4-09a5 VsanClusterPowerSystemImpl::PerformOrchestrationClusterPowerAction] Waiting other host power off. Connected host found ['192.xxx.x.181'], numLoop 2
2022-10-07T08:44:54.259Z info vsand[2104476] [opID=089825e4-09a5 VsanClusterPowerSystemImpl::PerformOrchestrationClusterPowerAction] Waiting other host power off. Connected host found ['192.xxx.x.181'], numLoop 3
2022-10-07T08:46:54.259Z info vsand[2104476] [opID=089825e4-09a5 VsanClusterPowerSystemImpl::PerformOrchestrationClusterPowerAction] Waiting other host power off. Connected host found ['192.xxx.x.181'], numLoop 4
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
For context:
1. IP 19.xx.x.181 is the IP of the vSAN network for the orchestration host.
2. IP 192.xxx.x.181 is the IP of the Witness traffic for the orchestration host.
The logs indicate that the IP used to identify the host in cluster information was switched from vSAN-IP to the Witness-IP after other hosts were shut down, causing the automated shutdown process logic to see it as a different host, and eventually, the cluster shutdown fails with a timeout error.