Virtual machines (VMs) appear with "stale" or disconnected network interfaces after a host reboot or site power restoration.
The issue intermittently resolves and recurs without manual intervention.
Th default vSwitch/vDS Failback policy in conjunction with unstable physical uplinks.
Failback Policy: When using the "Route Based on Originating Virtual Port" algorithm, the Failback setting is "Yes" by default. If an active NIC (e.g., vmnic2) fails and then recovers, the ESXi host immediately moves traffic back to it.
Path Isolation Failure: If the recovering NIC has an underlying MTU mismatch (e.g., 1500 on the vDS vs. a different setting on the physical switch) or is connected to an unstable port, the traffic is "preempted" back into a failing path.
Boot Sequencing: Following power restoration, ESXi hosts typically boot faster than enterprise physical switches. If the switch port is not in PortFast/Edge mode, the physical port remains in a "Listening/Learning" state while the VM attempts to join the network, leading to failed port allocation and disconnected vNICs.
To resolve this issue and prevent future occurrences, implement the following changes:
To prevent traffic from automatically returning to a recently recovered (but potentially unstable) NIC:
Navigate to the vSphere Distributed Switch (vDS) or Standard Switch.
Select the vSAN Port Group > Configure > Policies > Teaming and Failover.
Change Failback from Yes to No.
Repeat for the Management and vMotion port groups if stability issues persist.
Ensure that all physical switch ports connected to ESXi hosts are configured as "Edge" ports:
Enable Spanning Tree PortFast (Cisco) or Edge Port (Arista/Dell/HP).
This allows the port to transition immediately to a "Forwarding" state when the link comes up, preventing "stale" vNIC connections during host reboots.
Perform an end-to-end MTU audit using the following commands on the ESXi CLI:
Check vSwitch MTU: esxcfg-vswitch -l
Check VMkernel MTU: esxcfg-vmknic -l
Validate vSAN Connectivity: Use vmkping to test large packet sizes without fragmentation:
vmkping -I vmkX -d -s 1472 <Target_vSAN_IP>
(Note: Use 1472 for a 1500 MTU or 8972 for 9000 MTU).
If using vSphere Distributed Switches (vDS), change the Load Balancing policy to Route Based on Physical NIC Load.
This policy monitors actual NIC throughput and only shifts traffic if a NIC exceeds 75% utilization, preventing unnecessary traffic reshuffling.