vCenter reported "Host cannot communicate with other hosts" however there was never a network partition
book
Article ID: 331497
calendar_today
Updated On:
Products
VMware vSAN
Issue/Introduction
Symptoms:
vCenter reported an alarm "One host cannot communicate with other hosts and after a while the error disappeared from the vSAN health plugin.
From the vCenter server: - vmware-vsan-health-service-991.log; you may see vCenter could not fetch vmk information used for vSAN/unicast on Host:<HOSTNAME> during unicast check:
2019-01-02T04:58:38.509Z WARNING vsan-health[100e4ede-0e4b-11e9] [VsanVcClusterHealthSystemImpl::_GetConsistentConfigTest] Vmknic not present on host <hostname>.cityofmarion.lan, skip testing unicast
From the vCenter health report file: we see that health plugin reported vmknic on the host-1808 had issues howvever it also reported that "Cluster partition" state was green/good.
2019-01-02T04:58:40.55Z INFO vsan-health[healthThread-CloudHealthSender-22884] [VsanHealthSummaryLogUtil::PrintHealthResult] Cluster COM vxRail Overall Health : red Group network health : red Test hostdisconnected health : green Test hostconnectivity health : red HostsWithCommunicationIssues: Host (Host-88320), Test clusterpartition health : green Test vsanvmknic health : red HostsWithNoVsanVmknicPresent: Host (Host-88320), Test matchingsubnet health : yellow VsanIpSubnetConfigurations: Host IpSubnet(S) (Host-88316, 10.3.1.0/24), (Host-106228, 10.3.1.0/24), (Host-88318, 10.3.1.0/24), (Host-106233, 10.3.1.0/24), (Host-88312, 10.3.1.0/24), (Host-88320, ''), (Host-106239, 10.3.1.0/24), (Host-88314, 10.3.1.0/24), Test smallping health : green Test largeping health : green
You may not see any host being partitioned from clomd.log; hostd and vpxa.log. Clomd may never report any change state or node drop count.
From vsanmgmt.0 you may see "cmmds-tool" was reporting "Cannot allocate memory"
2019-01-02T04:58:38Z VSANMGMTSVC: ERROR vsanperfsvc[117160c8-0e4b-11e9] [VsanStretchedClusterSystemImpl::GetStretchedClusterInfoFromCmmds] Failed to get stretched cluster info from cmmds: Running cmd ['/bin/cmmds-tool', 'find', '--format=python', '-t', 'NODE'] with error /bin/cmmds-tool: error while loading shared libraries: libpthread.so.0: failed to map segment from shared object: Cannot allocate memory 2019-01-02T04:58:38Z VSANMGMTSVC: ERROR vsanperfsvc[117160c8-0e4b-11e9] [VsanStretchedClusterSystemImpl::GetStretchedClusterInfoFromCmmds] Running cmd ['/bin/cmmds-tool', 'find', '--format=python', '-t', 'NODE'] with error /bin/cmmds-tool: error while loading shared libraries: libpthread.so.0: failed to map segment from shared object: Cannot allocate memory Traceback (most recent call last): File "/build/mts/release/bora-10390 117/bora/build/esxvsan/release/vsan/usr/lib/vmware/vsan/perfsvc/VsanStretchedClusterSystemImpl.py", line 349, in GetStretchedClusterInfoFromCmmds File "/usr/lib/vmware/hostd/hmo/VsanInternalSystem.py", line 69, in _cmmds_find raise Exception('Running cmd %s with error %s' % (cmd, err)) Exception: Running cmd ['/bin/cmmds-tool', 'find', '--format=python', '-t', 'NODE'] with error /bin/cmmds-tool: error while loading shared libraries: libpthread.so.0: failed to map segment from share 2019-01-02T04:58:38Z VSANMGMTSVC: d object: Cannot allocate memory
2019-01-02T04:59:59.847Z cpu16:64944763)WARNING: User: 4530: cmmds-tool: Error in initial cartel setup: Failed late cartel initialization: Admission check failed for memory resource
From vsansystem.log: cmmds module in inaccessible state and busy. This is due to out of memory on cmmds module as reported in the above events.
Further inspecting the vsansystem.log, we see cmmds reporting memory exhaustion hence it failed to get membership count.
Hence the node count was reported as "nodeCount: 0" and that caused the network health alarm on the vCenter. This is a false positive alert where there was no network partition.