Network Loss on ESXi During Physical Switch Reboot or Hardware Failure Without Link State Change
book
Article ID: 426396
calendar_today
Updated On:
Products
VMware vSphere ESXi
Issue/Introduction
During a physical network maintenance or switch reboot, virtual machines hosted on ESXi experience a total loss of connectivity despite having redundant physical uplinks (vmnics).
In this scenario to maintain redundancy, one of the uplinks is connected to the physical switch that will be undergoing a reboot, whereas the other one is connected to another physical switch.
The uplink connected to the rebooting switch remains in a "Link Up" state from the ESXi perspective. Consequently, the ESXi host does not trigger an automatic failover to the healthy redundant path, continuing to route traffic toward the "hung" or unresponsive physical switch.
Environment
VMware vSphere ESXi
Cause
This issue occurs because the physical switch enters a state where it is no longer forwarding traffic but still provides enough electrical signaling or "Keep Alive" to the NIC to maintain a physical link-up status.
During a reboot or hardware failure, the switch management and control planes may go offline, but the physical port hardware may not effectively "shut down" or "link down" the connected peer (the ESXi vmnic).
When the Teaming Policy is set to Link Status Only, ESXi relies strictly on the network card reporting a "Link Down" event (e.g., loss of light or signal). If the switch port remains physically active but logically dead, ESXi remains unaware of the upstream failure.
Resolution
To resolve the immediate connectivity loss, the "dead" path must be manually shut down so that ESXi is forced to use the alternate uplink which resides on a healthy working switch.
If a switch is unresponsive or in a "hung" state but maintains a link-light, the network administrator should manually shut down the ports on this switch (if applicable) or physically disconnect the cables to force the ESXi host to trigger a failover.
In a nutshell, the physical networking team, should work to shut down the affected port on the switch, for the host to detect a link down and failover the traffic to the redundant uplink.