How to investigate unexpected link state changes reported by one or more vmnic in ESXi hosts
search cancel

How to investigate unexpected link state changes reported by one or more vmnic in ESXi hosts

book

Article ID: 376568

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

It is often the case that unexpected link state changes may be reported by ESXi / vCenter for physical uplinks (vmnics) associated with one or more ESXi host(s). 

When this occurs, the objective is to determine the likely cause.

Environment

ESXi

Cause

The most common cause of link state change events reported by ESXi (both directly, and through vCenter events) is a change in link state detected by the device driver associated with the uplink (vmnic).

The change can be from "Up" to "Down", or from "Down to Up".

If the change was not made by a VMware administrator (for example, using a command such as "esxcli network nic down -n vmnicN" where N = the vmnic number), then the first step would be to review the logs in the ESXi host.

Resolution

Here are some steps to take to reveal what the host logged for these events:

1) SSH into the ESXi host with root privileges

2) Change directory into /var/run/log (like "cd /var/run/log")

3) Get the most recent reboot event date / time from the vmksummary.log (or the .gz rotations for previous vmksummary)

cat vmksummary.log | grep booted

--> The output may appear like this:

[YYYY-MM-DDTHH:MM:SS] bootstop[2104778]: Host has booted

--> NOTE:  Time stamps ending in "Z" in VMware Logs are in Zulu time zone.  To convert to a time zone that is more your local time zone, refer to a site such as https://www.timeanddate.com/ and then use their tools to convert to whatever time zone makes sense for your local analysis.

4) Then examine the most recent vobd.log using a command like:

cat vobd.log | grep "linkstate"

--> The output may appear like this:

[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7891198507416us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7891187367545us: [esx.clear.net.vmnic.linkstate.up] Physical NIC vmnic2 linkstate is up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908515995969us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908592002230us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908594078919us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908601139619us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908606286740us: [vob.net.vmnic.linkstate.down] vmnic vmnic0 linkstate down
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908606751860us: [vob.net.vmnic.linkstate.up] vmnic vmnic0 linkstate up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908606751887us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908608200269us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908609030995us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908609893654us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908622217847us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908630836061us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up
[YYYY-MM-DDTHH:MM:SS]: [netCorrelator] 7908633431904us: [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up

 

5) If you need more corroboration, you can look at the vmkernel.log file using a command like this:

cat vmkernel.log | grep vmnic2 | egrep "Link is down|Link is up" | less

--> The output may appear like this:

[YYYY-MM-DDTHH:MM:SS] cpu4:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is down for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu4:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is up at 10000 Mbps Full Duplex for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu20:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is down for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu20:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is up at 10000 Mbps Full Duplex for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu12:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is down for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu12:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is up at 10000 Mbps Full Duplex for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu12:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is down for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu12:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is up at 10000 Mbps Full Duplex for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is down for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is up at 10000 Mbps Full Duplex for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is down for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is up at 10000 Mbps Full Duplex for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is down for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is up at 10000 Mbps Full Duplex for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is down for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is up at 10000 Mbps Full Duplex for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is down for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is up at 10000 Mbps Full Duplex for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is down for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is up at 10000 Mbps Full Duplex for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is down for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is up at 10000 Mbps Full Duplex for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is down for device vmnic2
[YYYY-MM-DDTHH:MM:SS] cpu28:2101725)i40en: i40en_UpdateUplinkLinkStatus:6857: Link is up at 10000 Mbps Full Duplex for device vmnic2

 

6) Assuming the outputs appears like this, you can reasonably assume that the driver recorded these link state changes based on information received from the physical switch(es) to which the vmnic(s) were connected.  

7) As a next step, please provide this information to your organization's team that manages the networking infrastructure external to the vmnic(s).

8) Ask them to investigate all infrastructure for the dates / times indicated.  

9) If you wish to see additional supporting info from ESXi logs, then more info is available in this KB --> Network adapter (vmnic) is down or fails with a failed criteria code

  • Specifically, the section "The following are the failed criteria codes." in the Resolution section of the KB is very useful. 

10) If further assistance is required:

a) Collect a log bundle for the ESXi host(s) affected (Reference:  Collecting diagnostic information for VMware ESXi

b) Open a support case and attach the logs to the case (Reference:  Uploading files to cases on the Broadcom Support Portal