This article provides information on troubleshooting issues when a network adapter fails.
The ESXi/vCenter UI and ESXi logs are showing NIC adapter alerts and messages, such as the 'Network uplink redundancy degraded' or 'Network uplink redundancy lost' alarm.
This KB goes over typical checks that can be done for troubleshooting.
Impact/Risks:
Packet flow associated with the services associated with the affected portgroup (either a standard switch portgroup, or a distributed virtual switch (DVS) portgroup) will cease along the data path associated with the named physical uplink (vmnic).This will impact one or more of the following:
The trigger for an event such as "Up" or "Down" is typically an external event upstream from the physical NIC.
The first step would be to ask the team that manages the physical infrastructure, external to the affected ESXi host, to investigate for reasons they may see for the event, in their switch logs for the physical switch and/or the switchport on the physical switch to which the vmnic is connected.
Regardless, the ESXi host logs will provide the opportunity for timeline analysis of the events.
It is important to note that logs never reveal causes -- they only reveal effects. But possible root causes can be investigated, once a clear understanding is available as to what events the ESXi host experienced, and when. For more information on how to collect logs, see Collecting diagnostic information for VMware ESXi
VMware vSphere ESXi
VMware vCenter Server
In some cases, a vmnic can fail because of device firmware and/or device driver issues.
If there is no obvious log event that would suggest a device driver / firmware issue, the next step is to ask the team that manages the physical infrastructure external to the affected ESXi host, to investigate for reasons they may see for the event in their switch logs for the physical switch and/or the switchport on the physical switch to which the vmnic is connected.
If that team does not find anything in their logs, then get a timeline analysis done by Broadcom Support.
In addition to the logs outlined above, useful information to include with the Problem Statement when opening a case would be:
esxcli network nic list
Name PCI Device Driver Admin Status Link Status Speed Duplex MAC Address MTU Description
------- ------------ ------ ------------ ----------- ------ ------ ----------------- ---- -----------
vmnic0 0000:01:00.0 ixgben Up Up 1000 Full ec:f4:##:##:##:## 1500 Intel(R) Ethernet Controller X540-AT2
vmnic1 0000:01:00.1 ixgben Up Up 1000 Full ec:f4:##:##:##:## 1500 Intel(R) Ethernet Controller X540-AT2
Note: The Admin Status is the only portion of the output that ESXi controls. The status can be changed by using the the following commands:esxcli network nic down -n vmnicX
esxcli network nic up -n vmnicX
/var/run/log/vobd.log
log file.If "vmnic down" or "vmnic up" messages are observed, it may indicate that the NIC is flapping.
Note: Some NICs report the NIC link up state only, not the down state. If the NIC is reported as "up" and the host was not rebooting, this is an indication that the NIC is flapping and not reporting the down state to ESXi.
Timestamps suffixed with the letter "Z" (as shown in the example below) are in UTC (Coordinated Universal Time). Credible internet references can be used to convert the UTC time to the equivalent local time zone.
Check for a failed criteria code with the vmnic messages. If there is a failed criteria code listed, please see step 4 below.
If there is no failed criteria code, and everything was checked in step 2 above, we suggest opening a case with the hardware vendor and have them investigate.
/var/run/log/vobd.log
file, the vmnic failure may be classified with a Failed criteria code. This code explains the reason for the vmnic failure.The criteria that are used to determine if a network adapter in a network adapter team has failed include:
The Failed criteria code of 32 indicates the link has failed due to Beacon Probing detecting a problem. Beacon Probing sends beacons per VLAN between physical NICs in a team. When these are not received by other NICs this means that there is a problem in the physical network.
When there are multiple failures, entries similar to these are seen in the /var/run/log/vobd.log
file:
YYYY-MM-DDThh:mm:ss.449Z: [netCorrelator] 1123644995238us: [vob.net.pg.uplink. transition.down] Uplink: vmnic# is down. Affected portgroup: ########. 0 uplinks up. Failed criteria: 130
The failed criteria here is 130, which is 2 + 128. This is a combination of these two failure codes:
Link speed reported by the driver (equal or greater for compliance)
Link state reported by the driver