VMs drop out of network randomly
search cancel

VMs drop out of network randomly

book

Article ID: 312559

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

The purpose of this article is to outline Dell Support statements for Network Partition(NPAR) and Team Mode configuration mistake that may cause unexpected network issues

Symptoms:

  • Some VMs lose network connectivity at random intervals across multiple ESXi hosts in same cluster.
  • Affected VMs are not pingable after vMotion from VMs on other hosts on any configured VLAN except VMs residing on same VLAN on same ESXi host 
  • NPAR is configured in Dell server BIOS.
  • Virtual switch (standard or distributed) contains more than one partition from the same physical port.

Environment

VMware vSphere 6.5.x
VMware vSphere 5.5.x
VMware vSphere 6.7.x

Cause


With Dell EMC BMC 578xx CNA adapters for example, enabling NPAR or Network Partitioning allows for a total of 8 partitions (logical NICs) that may be presented to the Host (Server) Operating System. With a four port adapter, each physical port will contain two partitions (logical NICs) that combined share the physical capabilities of that single port. With a two port adapter, each physical port will contain four partitions (logical NICs) that combined share the physical capabilities of that single port.

A common misconception is that these eight partitions are equivalent to eight physical ports which is not the case. Care must be taken when configuring these partitions so as not to overload a single physical port as well as not to combine multiple partitions from the same physical port into the same team or virtual switch.

Regardless of the Operating System, Dell EMC Engineering DOES NOT support an NPAR configuration where a team or virtual switch (simple or distributed) contains more than one partition from the same physical port.

Not only are multiple partitions from the same physical port not supported in a team or virtual switch, that configuration does not provide any physical redundancy, adds additional driver stack overhead, adds complexity, and may result in a variety of unexpected network issues. These unexpected issues could be things like:
  • Performance issues
  • Port flapping or network disconnects
  • NetQ or Mac filter errors in the OS logs
  • With virtualization Operating Systems, you may encounter errors attempting to move virtual machines from one host to a second.
  • Replication of ingress traffic across partitions from the same physical port

Dell KB link: https://www.dell.com/support/article/en-in/qna44711/what-is-a-common-network-partition-and-teaming-mode-mistake-that-may-result-in-unexpected-network-issues?lang=en

Resolution

For a four port adapter, the recommendation would be to keep partition one and partition two from each physical port in different teams or virtual switches.

Workaround:

Disable and enable the NIC on the affected VM

OR 

Migrate VM to any other host in the cluster