"vSphere HA detected that this host is in a different network partition than the master to which vCenter Server is connected" error on vCenter Server
search cancel

"vSphere HA detected that this host is in a different network partition than the master to which vCenter Server is connected" error on vCenter Server

book

Article ID: 402728

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • There are multiple ESXI hosts that are part of one vSphere cluster.
  • There are different subnets to which these ESXI hosts belong to.
  • In few ESXI hosts, HA is configured successfully and in few ESXI hosts we see
                 Alert: [Critical] Alarm alarm.HAhostStatus on Host <FQDN Of ESXI Host> because vSphere HA detected that host <FQDN Of ESXI Host> is in a different network partition than the master <cluster-name> in <Datacenter/vCenter>.
  • In vSphere /var/log/vmware/vpxd logs we see below entries.

yyyy-mm-ddThh:mm:ss.928Z info vpxd[26960] [Originator@6876 sub=DAS opID=mguyh8km-835012-auto-hwat-h5:70190447-eb] Triggering Das host config LRO to install and configure fdm on host [vim.HostSystem:host-13219,<FQDN Of ESXI Host>]
yyyy-mm-ddThh:mm:ss.941Z info vpxd[26988] [Originator@6876 sub=vpxLro opID=mguyh8km-835012-auto-hwat-h5:70190447-eb-04] [VpxLRO] -- BEGIN task-131973 -- <FQDN Of ESXI Host> -- DasConfig.ConfigureHost --
yyyy-mm-ddThh:mm:ss.276Z info vpxd[26988] [Originator@6876 sub=Vmomi opID=mguyh8km-835012-auto-hwat-h5:70190447-eb-04] Creating SOAP stub adapter for /fdm on <FQDN Of ESXI Host>.net:443
yyyy-mm-ddThh:mm:ss.352Z info vpxd[26988] [Originator@6876 sub=HostUpgrader opID=mguyh8km-835012-auto-hwat-h5:70190447-eb-04] Failed to call FDM Debug Manager on [vim.HostSystem:host-13219,<FQDN Of ESXI Host>]
yyyy-mm-ddThh:mm:ss.894Z info vpxd[26988] [Originator@6876 sub=DAS opID=mguyh8km-835012-auto-hwat-h5:70190447-eb-04-01] Starting fdm service on host [vim.HostSystem:host-13219,<FQDN Of ESXI Host>.net]
yyyy-mm-ddThh:mm:ss.623Z info vpxd[26988] [Originator@6876 sub=DAS opID=mguyh8km-835012-auto-hwat-h5:70190447-eb-04-01] Waiting for fdm service to come up on host [vim.HostSystem:host-13219,<FQDN Of ESXI Host>.net]

  • vCenter tries to contact FDM debug Manager on ESXI host but fails.
  • Later we see the series of re-tries in /var/log/vmware/vpxd but fails .

[context]zKq7AVECAQAAAI48ewEcdnB4ZAAAQxxTbGlidm1hY29yZS5zbwAACBhCACk/QwCWmUoB87EFbGliY3NpLXR5cGVzLnNvAAHUsgUC4WIhbGlidm1vbWkuc28AAuqQIQJfCiGDk0VMAXZweGQAg2OOTAGDa5pMAYNom0wBg3c0TAGDb2JMAQLX2hoBtsMEg4+YqAGDQaWoAYOzxqMBg5n5fgKDWPp+AgAHUTgABOw3ABdFOADFD1EEsI4AbGlicHRocmVhZC5zby4wAAXf+g9saWJjLnNvLjYA[/context]
yyyy-mm-ddThh:mm:ss.039Z info vpxd[26985] [Originator@6876 sub=Vmomi opID=FdmWaitForUpdates-vim.ClusterComputeResource:domain-c65-26562882] Stale SOAP session to host <FQDN Of ESXI Host>.net; reinitializing
yyyy-mm-ddThh:mm:ss.039Z info vpxd[26985] [Originator@6876 sub=Vmomi opID=FdmWaitForUpdates-vim.ClusterComputeResource:domain-c65-26562882] Creating SOAP stub adapter for /fdm on <FQDN Of ESXI Host>.net:443
yyyy-mm-ddThh:mm:ss.129Z info vpxd[26985] [Originator@6876 sub=HostUpgrader opID=FdmWaitForUpdates-vim.ClusterComputeResource:domain-c65-26562882] Choosing bundle file /etc/vmware-vpx/docRoot/vSphere-HA-depot/vib20/vmware-fdm/VMware_bootbank_vmware-fdm_8.0.3-24853646.vib, with VC build 24853646
yyyy-mm-ddThh:mm:ss.129Z info vpxd[26985] [Originator@6876 sub=HostUpgrader opID=FdmWaitForUpdates-vim.ClusterComputeResource:domain-c65-26562882] Choosing bundle file /etc/vmware-vpx/docRoot/vSphere-HA-depot/VMware-fdm-8.0.3-24853646.fdmVersion.txt, with VC build 24853646
yyyy-mm-ddThh:mm:ss.130Z info vpxd[26985] [Originator@6876 sub=HostUpgrader opID=FdmWaitForUpdates-vim.ClusterComputeResource:domain-c65-26562882] Choosing fdm bundle with build 24853646
yyyy-mm-ddThh:mm:ss.134Z info vpxd[26985] [Originator@6876 sub=vmomi.soapStub[53646] opID=FdmWaitForUpdates-vim.ClusterComputeResource:domain-c65-26562882] SOAP request returned HTTP failure; <<io_obj p:0x00007f8f48972df8, h:73, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-13219/fdm>, method: GetDebugManager; code: 503(Service Unavailable); fault: (null)
yyyy-mm-ddThh:mm:ss.134Z warning vpxd[26985] [Originator@6876 sub=Vmomi opID=FdmWaitForUpdates-vim.ClusterComputeResource:domain-c65-26562882] Got vmacore exception when invoking VMOMI method; <</hgw/host-13219>, /fdm>, csi.FdmService.GetDebugManager, N7Vmacore4Http13HttpExceptionE(HTTP error response: Service Unavailable)
yyyy-mm-ddThh:mm:ssECAQAAAI48ewEddnB4ZAAAQxxTbGlidm1hY29yZS5zbwAACBhCACk/QwCWmUoBIEIebGlidm1vbWkuc28AAT9kIQHqkCEBXwohgpNFTAF2cHhkAIJjjkwBgmuaTAGCaJtMAYJ3NEwBgm9iTAEB19oaA9PFBGxpYmNzaS10eXBlcy5zbwCCKGE/AYKErD8Bgg2ZqAGCQaWoAYKzxqMBgpn5fgKCWPp+AgAHUTgABOw3ABdFOADFD1EEsI4AbGlicHRocmVhZC5zby4wAAXf+g9saWJjLnNvLjYA[/context]
yyyy-mm-ddThh:mm:ss.137Z info vpxd[26985] [Originator@6876 sub=HostUpgrader opID=FdmWaitForUpdates-vim.ClusterComputeResource:domain-c65-26562882] Failed to call FDM Debug Manager on [vim.HostSystem:host-13219,<FQDN Of ESXI Host>.net]
yyyy-mm-ddThh:mm:ss.148Z info vpxd[26985] [Originator@6876 sub=vmomi.soapStub[46781] opID=FdmWaitForUpdates-vim.ClusterComputeResource:domain-c65-26562882] SOAP request returned HTTP failure; <<io_obj p:0x00007f8f488f60a8, h:73, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-13187/fdm>, method: retrieveClusterInfo; code: 500(Internal Server Error); fault: (csi.fault.NotAuthenticated) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>
-->    msg = "Received SOAP response fault from [<<io_obj p:0x00007f8f488f60a8, h:73, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-13187/fdm>]: retrieveClusterInfo
--> "
--> }
yyyy-mm-ddThh:mm:ss.149Z info vpxd[26985] [Originator@6876 sub=Vmomi opID=FdmWaitForUpdates-vim.ClusterComputeResource:domain-c65-26562882] Retry SOAP call after exception; <</hgw/host-13187>, /fdm>, csi.FdmService.retrieveClusterInfo, N3Csi5Fault16NotAuthenticated9ExceptionE(Fault cause: csi.fault.NotAuthenticated

  • Basic ping tests between two subnets of ESXI will help us identifying if ICMP is  blocked, therefore Ping from one ESXI host to other and vice versa should show 100% packet loss or no result.
  • The /var/run/log/fdm.log from the primary host reports the following entries
           warning  fdm Sendto[ipv4] ###.###.###.###: Host is down
        verbose  fdm Waited 5 seconds for icmp ping reply for host-##
        verbose  fdm Checking for Partition
        error    fdm  [60 times] sendto ###.###.###.### failed: Host is down
        info     fdm  Host host-## changed state: Partitioned

Environment


VMware vSphere ESXi 8.x

Cause

If the primary host is unable to communicate directly with the agent on a secondary host—for instance, if the secondary host does not respond to ICMP pings, but the secondary host is still sending heartbeats via the datastore, the primary host considers the secondary host to be either network partitioned or isolated. In this scenario, no VM failover occurs. However, the primary host continues to monitor the secondary host and its virtual machines.

Reference - Network Partitions

Resolution

  • The alert is triggered due to connectivity issues between the primary and secondary hosts, which should be investigated at the physical network layer. Please refer to the following KB article to test connectivity between the hosts - Testing network connectivity with the ping command
  • Please work with internal firewall/Network team to unblock ICMP packets bidirectionally.

Additional Information

Reference - Host Failure Types