vSAN: "SHUTDOWN CLUSTER" is failed with error "Disconnected host xx.xx.xx.xx found in orchestration host"
search cancel

vSAN: "SHUTDOWN CLUSTER" is failed with error "Disconnected host xx.xx.xx.xx found in orchestration host"

book

Article ID: 377650

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

In vSAN 8.0 U2, "SHUTDOWN CLUSTER" is failed with error "Disconnected host xx.xx.xx.xx found in orchestration host",

In older versions, the error is "Disconnected host found from orch xx.xx.xx.xx", which is slightly different,

The IP address xx.xx.xx.xx in the error is the vSAN IP address (IP address of the vmk with VSAN tag) of a host.

Environment

VMware vSAN 7.0 U3

VMware vSAN 8.x

Cause

The Shutdown Cluster Wizard is available with vSAN 7.0 Update 3 and later releases. It requires the vSAN master host to communicate with other hosts via vSAN network in this process. Starting from vSAN 8.0 U2 port 443 is used and in older versions port 80 is used.

For vSAN 8.0 U2 and later, we need to check the connectivity with port 443,

vSphereClient                Inbound    TCP       Dst               443       443
httpClient                   Outbound   TCP       Dst               443       443
esxupdate                    Outbound   TCP       Dst               443       443
gstored                      Outbound   TCP       Dst               443       443
vltd                         Outbound   TCP       Dst               443       443
vsanmgmt-https-tunnel        Outbound   TCP       Dst               443       443

And in older versions we need to check port 80,

updateManager                Outbound   TCP       Dst                80        80
faultTolerance               Outbound   TCP       Dst                80        80
webAccess                    Inbound    TCP       Dst                80        80
httpClient                   Outbound   TCP       Dst                80        80

The problem happens when vSAN master host fails to communicate with other hosts via vSAN network with port 443 (8.0 U2 and later) or 80 (older versions).

 

Resolution

There can be multiple reasons for his failure. In most cases, the orchestration host is unable to communicate due to certification issues. 

This can be resolved by renewing the certificates on each host. Please see: Renew VMCA Certificates with New VMCA-Signed Certificates Using the vSphere Client

If that fails then troubleshoot the connectivity of vSAN network for port 443 or 80. It could a bad setting of Allowed IP addresses like this,

(this is a 8.0 U2 cluster so port 443 should be inspected)

In this example,

host 20.0.0.11 with vSAN address 60.0.0.11

host 20.0.0.12 with vSAN address 60.0.0.12

host 20.0.0.13 with vSAN address 60.0.0.13

We need to add "60.0.0.11, 60.0.0.12, 60.0.0.13" to the Allowed IP addresses to allow vSAN communication between hosts.

Additional Information

In vSAN 8.0 U2 cluster we see such such error "Not all tasks are finished with timeout 30" and "Disconnected host 60.0.0.12 found in orchestration host" in vsanmgmt.log in vSAN master host,

2024-09-19T06:53:03.855Z Er(11) vsand[266217]: [opID=agw-0000330-25ee-10a0 VsanHealthThreadMgmt::join] Not all tasks are finished with timeout 30
2024-09-19T06:53:03.855Z Er(11)[+] vsand[266217]: Traceback (most recent call last):
2024-09-19T06:53:03.855Z Er(11)[+] vsand[266217]:   File "/usr/lib/vmware/vsan/perfsvc/VsanHealthThreadMgmt.py", line 408, in join
2024-09-19T06:53:03.855Z Er(11)[+] vsand[266217]:   File "/lib64/python3.8/concurrent/futures/_base.py", line 240, in as_completed
2024-09-19T06:53:03.855Z Er(11)[+] vsand[266217]: concurrent.futures._base.TimeoutError: 2 (of 3) futures unfinished
2024-09-19T06:53:03.855Z Er(11) vsand[266217]: [opID=agw-0000330-25ee-10a0 VsanHealthHelpers::LoginToMultipleHostsDirectly] Some host timed out getting hostInfo
2024-09-19T06:53:03.856Z Er(11) vsand[266217]: [opID=agw-0000330-25ee-10a0 VsanClusterPowerSystemImpl::PerformOrchestrationClusterPowerAction] PerformOrchestrationClusterPowerAction error: currentPowerStatus: vsanMemberShipUpdateDisabled
2024-09-19T06:53:03.856Z Er(11)[+] vsand[266217]: Traceback (most recent call last):
2024-09-19T06:53:03.856Z Er(11)[+] vsand[266217]:   File "/usr/lib/vmware/vsan/perfsvc/VsanClusterPowerSystemImpl.py", line 475, in PerformOrchestrationClusterPowerAction
2024-09-19T06:53:03.856Z Er(11)[+] vsand[266217]:   File "/usr/lib/vmware/vsan/perfsvc/VsanClusterPowerSystemImpl.py", line 412, in TryGetHostInfoWithCapAndConnectivity
2024-09-19T06:53:03.856Z Er(11)[+] vsand[266217]: Exception: Disconnected host 60.0.0.12 found in orchestration host

It means that the master host can not connect to 60.0.0.12 which is the vSAN address of another host.

We also see such error "Failed to test vsan vmodl version with error <urlopen error timed out> on xx.xx.xx.xx" in vsanmgmt.log,

2024-09-19T06:53:03.894Z Er(11) vsand[266210]: [opID=agw-0000330-25ee-10a0 VsanVimHelpers::GetVsanVersionNamespace] Failed to test vsan vmodl version with error <urlopen error timed out> on 60.0.0.13
2024-09-19T06:53:03.895Z Wa(12) vsand[266210]: [opID=agw-0000330-25ee-10a0 VsanVimHelpers::GetVsanVersionNamespace] Retry retrieving vsan vmodl version, 5
2024-09-19T06:53:03.895Z Wa(12) vsand[266210]: [opID=agw-0000330-25ee-10a0 VsanVimHelpers::GetVsanVersionNamespace] execTime 5 is larger than retry delay 5
2024-09-19T06:53:03.915Z Er(11) vsand[267290]: [opID=agw-0000330-25ee-10a0 VsanVimHelpers::GetVsanVersionNamespace] Failed to test vsan vmodl version with error <urlopen error timed out> on 60.0.0.12
2024-09-19T06:53:03.916Z Wa(12) vsand[267290]: [opID=agw-0000330-25ee-10a0 VsanVimHelpers::GetVsanVersionNamespace] Retry retrieving vsan vmodl version, 5
2024-09-19T06:53:03.916Z Wa(12) vsand[267290]: [opID=agw-0000330-25ee-10a0 VsanVimHelpers::GetVsanVersionNamespace] execTime 5 is larger than retry delay 5

And again in vsanmgmt.log there is a clear error indicating a connectivity issue,

2024-09-19T06:53:12.294Z Er(11) vsand[266216]: [opID=vsan-1e02f1fb-62272ded1d34d statscollector::RetrieveRemoteStats] VMK vmk1 can not connect to host 60.0.0.13.
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]: Traceback (most recent call last):
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/usr/lib/vmware/vsan/perfsvc/statscollector.py", line 1192, in RetrieveRemoteStats
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/lib64/python3.8/site-packages/pyVmomi/VmomiSupport.py", line 598, in <lambda>
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/lib64/python3.8/site-packages/pyVmomi/VmomiSupport.py", line 388, in _InvokeMethod
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/lib64/python3.8/site-packages/pyVmomi/SoapAdapter.py", line 1527, in InvokeMethod
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/lib64/python3.8/site-packages/pyVmomi/SoapAdapter.py", line 1611, in GetConnection
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/usr/lib/vmware/vsan/perfsvc/VsanHealthUtil.py", line 1770, in __call__
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/lib64/python3.8/http/client.py", line 1259, in request
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/lib64/python3.8/http/client.py", line 1305, in _send_request
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/lib64/python3.8/http/client.py", line 1254, in endheaders
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/lib64/python3.8/http/client.py", line 1014, in _send_output
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/lib64/python3.8/http/client.py", line 954, in send
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/usr/lib/vmware/vsan/perfsvc/VsanHealthUtil.py", line 1914, in connect
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/lib64/python3.8/http/client.py", line 1421, in connect
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/lib64/python3.8/http/client.py", line 925, in connect
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/usr/lib/vmware/vsan/perfsvc/VsanHealthUtil.py", line 1906, in vsanperf_create_connection
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/usr/lib/vmware/vsan/perfsvc/VsanHealthUtil.py", line 1869, in VsanPerfCreateConnection
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]:   File "/usr/lib/vmware/vsan/perfsvc/VsanHealthUtil.py", line 1860, in VsanPerfCreateConnection
2024-09-19T06:53:12.294Z Er(11)[+] vsand[266216]: socket.timeout: timed out
2024-09-19T06:53:12.295Z Wa(12) vsand[266216]: [opID=vsan-1e02f1fb-62272ded1d34d statscollector::RetrieveRemoteStats] No available vmknic to retrieve remote stats.