ESXi hosts experienced network failures due to LACP configuration issues and physical network adapter link state, transitions to down.
This resulted in Link Aggregation Group (LAG) going down and hosts becoming unreachable.
Log snippets:
vobd.all:
YYYY-MM-DDTHH:MM:SS: [netCorrelator] 14871524271453us: [vob.net.vmnic.linkstate.down] vmnic vmnic3 linkstate downYYYY-MM-DDTHH:MM:SS: [netCorrelator] 14871524295829us: [vob.net.lacp.uplink.transition.down] LACP warning: Uplink vmnic3 on VDS DvsPortset-0 moved out of the link aggregation group.YYYY-MM-DDTHH:MM:SS: [netCorrelator] 14871795522277us: [esx.problem.net.lacp.uplink.transition.down] uplink vmnic3 on VDS DvsPortset-0 is moved out of link aggregation group.YYYY-MM-DDTHH:MM:SS: [netCorrelator] 14871524312662us: [vob.net.vmnic.linkstate.down] vmnic vmnic0 linkstate down
vmkernel.all:
YYYY-MM-DDTHH:MM:SS: cpu37:2098026)ntg3:vmnic3:Ntg3PhyStateGet:417:link downYYYY-MM-DDTHH:MM:SS: cpu28:2098215)Team.cswitch: TeamVSLACPLAGEventCB:9077: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Received event UPLINK LINK STATUS, LAG /1687952376, link UNKNOWN, uplink vmnic3/0x84000016, link DOWNYYYY-MM-DDTHH:MM:SS: cpu46:2098020)ntg3:vmnic0:Ntg3PhyStateGet:417:link downYYYY-MM-DDTHH:MM:SS: cpu28:2098215)Team.cswitch: TeamVSLACPLAGEventCB:9077: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Received event UPLINK LINK STATUS, LAG /1687952376, link UNKNOWN, uplink vmnic0/0x84000010, link DOWNYYYY-MM-DDTHH:MM:SS: cpu11:2098022)ntg3:vmnic1:Ntg3PhyStateGet:417:link downYYYY-MM-DDTHH:MM:SS: cpu28:2098215)Team.cswitch: TeamVSLACPLAGEventCB:9077: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Received event UPLINK LINK STATUS, LAG /1687952376, link UNKNOWN, uplink vmnic1/0x84000012, link DOWN
This indicates that ESXi received a link status change event from the physical switch, stating that the uplink associated with LACP vmnicX has gone down.
This also reflects ESXi reacting to the notification, marking the LAG as down in response to the earlier hardware-triggered link down event.
LACP misconfiguration on the Virtual Distributed Switch (VDS).
Physical network adapters (vmnic0, vmnic1, vmnic2, vmnic3) experiencing link state transitions to down.
LAG moving out of the link aggregation configuration.
Switch-side configuration mismatch, causing loss of network stability.
Potential NIC or hardware issues affecting connectivity.
Immediate Steps:
Check physical network cables for proper connectivity.
Ensure LACP is correctly configured on both the switch and ESXi hosts. Configuring a LAG on a vSphere Distributed Switch Port Group when using LACP
Verify the status of physical NICs and test them for hardware failures. Verifying network links for ESX/ESXi hosts
Restart affected ESXi hosts to restore connectivity.
Advanced Troubleshooting:
Review switch-side LACP negotiation settings for mismatches.
Check ESXi networking settings, ensuring uplinks are properly configured within the VDS.
Examine host logs (vobd, vmkernel, hostd, fdm) for deeper analysis.(Raise a support case with Broadcom Support along with ESXi Log bundle)
Run NIC diagnostics to identify possible hardware faults.
By systematically addressing these areas, ESXi host connectivity can be restored and stabilized.
If problems persist, consider replacing faulty network adapters or revisiting switch configuration settings.