ESXi Host Loses Network Connectivity Briefly Due to Mellanox ConnectX-4 Lx NIC Link Flap During MTU Reconfiguration
search cancel

ESXi Host Loses Network Connectivity Briefly Due to Mellanox ConnectX-4 Lx NIC Link Flap During MTU Reconfiguration

book

Article ID: 422459

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

An ESXi host may experience a brief network connectivity loss (typically 5–10 seconds) when both physical uplinks momentarily report link down almost simultaneously. This can result in:

  • Temporary host isolation

  • Management network interruption

  • Uplink redundancy loss on distributed switches

  • Short-lived VM network impact if no additional redundancy exists

The issue has been observed on hosts using Mellanox ConnectX-4 Lx adapters, where both vmnics are affected within seconds of each other.

Environment

vSphere ESXi 8.x

 

Cause

The network interruption is caused by driver-initiated link resets, not by any physical network fault.

The nmlx5_core driver intentionally tears down and re-establishes the NIC link in order to re-apply MTU configuration (1500 → 9000) during or shortly after host boot or driver reinitialization.

 This can be verified in vmkernel logs: /var/run/log/vmkernel.log:

xxxx-xx-xxTxx:xx:xx.xxxx In(182) vmkernel: cpu13:2097517)<NMLX_INF> nmlx5_core: vmnic0: nmlx5_en_ChangeMTU - (nmlx5_core_en_main.c:2241) Changing MTU from: 1500 to: 9000
 
 
xxxx-xx-xxTxx:xx:xx.xxxx In(182) vmkernel: cpu13:2097517)<NMLX_INF> nmlx5_core: vmnic1: nmlx5_en_ChangeMTU - (nmlx5_core_en_main.c:2241) Changing MTU from: 1500 to: 9000
 

From /var/run/log/vobd.log:

[vob.net.vmnic.linkstate.down] vmnic1 linkstate down

[vob.net.vmnic.linkstate.down] vmnic0 linkstate down

 
 

 

Resolution

No immediate corrective action is required if the event is:

  • Short-lived (seconds)

  • Self-recovering

  • Occurs during reboot or early host initialization

The links automatically return to the UP state once MTU reconfiguration completes.

Recommended Mitigations / Best Practices:

To minimize impact or avoid recurrence:

  1. Increase Physical Uplink Redundancy

    • Use 3 or more uplinks per host to avoid full redundancy loss during driver reconfiguration events

  2. Validate Mellanox Driver & Firmware Compatibility

    • Ensure the nmlx5_core driver and NIC firmware versions match the Broadcom Compatibility Guide (BCG)

    • Mismatched versions can increase reinitialization behavior

  3. Avoid MTU Changes During Production Hours

    • MTU reapplication causes link resets

    • Perform MTU-related changes during maintenance windows

  4. Monitor for Recurrence

    • If link flaps occur frequently or outside reboot/initialization windows, further investigation may be required