Host Reports Network Partition Due to NIC Teaming Policy Mismatch
search cancel

Host Reports Network Partition Due to NIC Teaming Policy Mismatch

book

Article ID: 406619

calendar_today

Updated On:

Products

VMware vCenter Server 8.0

Issue/Introduction

After a recent configuration change or host addition, a vSphere HA-enabled ESXi host is reporting a network partition error, preventing it from joining the HA cluster correctly. The specific error observed is:
vSphere HA detected that this host is in a different network partition than the master to which vCenter server is connected

Environment

VMware vCenter Server 7.0

VMware vCenter Server 8.0

Cause

The root cause of the network partition detection was a mismatch in the NIC Teaming policy configured on the affected ESXi host's network uplink (specifically on the vSwitch or vDS port group used for management and vSphere HA heartbeat traffic) compared to the rest of the HA cluster.

The affected host was configured to use "Route based on originating virtual port ID" for its network teaming, while the other hosts in the cluster were utilizing "Route based on IP Hash."

This mismatch created a logical network partition from vSphere HA's perspective because the network traffic distribution mechanism was inconsistent, preventing reliable communication and heartbeat exchange required for HA cluster integrity. "Route based on IP Hash" requires specific Link Aggregation Group (LAG) configuration on the physical network switches (e.g., LACP), which would not be compatible with a host configured for "Route based on originating virtual port ID" if sharing the same physical uplinks.

Resolution

To resolve the network partition and restore vSphere HA functionality, the NIC Teaming policy on the affected ESXi host was corrected to align with the rest of the cluster.

Steps:

  1. Identify the Affected Host: Log in to vCenter Server and identify the ESXi host reporting the network partition error.
  2. Determine Current Teaming Policy:
    • Navigate to the affected ESXi host in vCenter.
    • Go to Configure > Networking > Virtual Switches (for vSwitches) or vSphere Distributed Switches (for vDS).
    • Examine the port group(s) used for management traffic (where vmk0 resides) and any other network paths relevant for vSphere HA heartbeat. Note its current "Load Balancing" (teaming) policy.
  3. Confirm Cluster's Teaming Policy: Verify that the rest of the hosts in the vSphere HA cluster are indeed using "Route based on IP Hash" on their corresponding network port groups. This usually implies LACP (Link Aggregation Control Protocol) is configured on the physical switches.
  4. Change Teaming Policy on Affected Host:
    • Navigate back to the affected host's network configuration (vSwitch or vDS port group).
    • Edit the settings of the relevant port group (e.g., "Management Network").
    • Go to the Teaming and failover section.
    • Change the Load Balancing policy to "Route based on IP Hash."
    • Crucial: Ensure that the physical switch ports connected to the ESXi host's uplinks are correctly configured for LACP and that the LACP is operational.
    • Click OK to save the changes.
  5. Verify vSphere HA Status:
    • Monitor the vSphere HA cluster status. The host should now correctly join the HA cluster, and the network partition error should clear.
    • You may need to manually reconfigure HA on the host if the error persists for a short period.

Additional Information