VMs abruptly lose connectivity when using Cisco UCS VIC
search cancel

VMs abruptly lose connectivity when using Cisco UCS VIC

book

Article ID: 408697

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • Host is up but guest network traffic is not working
  • Guest network traffic is using Cisco UCS Virtual Interface Card (VIC).
  • Unable to see any hardware faults in the UCS Manager.
  • In the /var/run/log/vmkernel.log file, there are error messages similar to:

yyyy-mm-ddThh:mm:ss.sssZ cpu82:2097580)WARNING: nenic: _vnic_dev_cmd2:265: 0000:1c:00.0: Fatal error while issuing devcmd2 command 4, hardware surprise removal
yyyy-mm-ddThh:mm:ss.sssZ cpu82:2097580)WARNING: nenic: _vnic_dev_cmd2:265: 0000:1c:00.2: Fatal error while issuing devcmd2 command 4, hardware surprise removal
yyyy-mm-ddThh:mm:ss.sssZ cpu82:2097580)WARNING: nenic: _vnic_dev_cmd2:265: 0000:1c:00.1: Fatal error while issuing devcmd2 command 4, hardware surprise removal
yyyy-mm-ddThh:mm:ss.sssZ cpu82:2097580)WARNING: nenic: _vnic_dev_cmd2:265: 0000:1c:00.3: Fatal error while issuing devcmd2 command 4, hardware surprise removal
yyyy-mm-ddThh:mm:ss.sssZ cpu97:2097580)WARNING: nenic: _vnic_dev_cmd2:265: 0000:1c:00.0: Fatal error while issuing devcmd2 command 4, hardware surprise removal
yyyy-mm-ddThh:mm:ss.sssZ cpu97:2097580)WARNING: nenic: _vnic_dev_cmd2:265: 0000:1c:00.2: Fatal error while issuing devcmd2 command 4, hardware surprise removal
yyyy-mm-ddThh:mm:ss.sssZ cpu97:2097580)WARNING: nenic: _vnic_dev_cmd2:265: 0000:1c:00.1: Fatal error while issuing devcmd2 command 4, hardware surprise removal
yyyy-mm-ddThh:mm:ss.sssZ cpu97:2097580)WARNING: nenic: _vnic_dev_cmd2:265: 0000:1c:00.3: Fatal error while issuing devcmd2 command 4, hardware surprise removal

...
yyyy-mm-ddThh:mm:ss.sssZ cpu82:2097295)WARNING: Uplink: 21014: Queue 0 of device vmnic5 stuck, resetting the device
yyyy-mm-ddThh:mm:ss.sssZ cpu82:2097295)WARNING: Uplink: 21014: Queue 0 of device vmnic5 stuck, resetting the device
yyyy-mm-ddThh:mm:ss.sssZ cpu82:2097295)WARNING: Uplink: 21014: Queue 0 of device vmnic5 stuck, resetting the device
yyyy-mm-ddThh:mm:ss.sssZ cpu64:2097295)WARNING: Uplink: 21014: Queue 0 of device vmnic5 stuck, resetting the device

...

yyyy-mm-ddThh:mm:ss.sssZ cpu112:6665230)Vmxnet3: 21129: ################.eth0,##:##:##:##:##:##, portID(67108893): Hang detected,numHangQ: 8, enableGen: 38
yyyy-mm-ddThh:mm:ss.sssZ cpu112:6665230)Vmxnet3: 21138: portID:67108893, QID: 0, next2TX: 117, next2Comp: 55, lastNext2TX: 57, next2Write:102, ringSize: 512 inFlight: 13, delay(ms): 10141,txStopped: 0
yyyy-mm-ddThh:mm:ss.sssZ cpu112:6665230)Vmxnet3: 21142: portID: 67108893, sop: 55 eop: 56 enableGen: 0 qid: 38, pkt: 0x45ba15c8c1c0

...

yyyy-mm-ddThh:mm:ss.sssZ cpu112:6665230)Vmxnet3: 21138: portID:67108893, QID: 7, next2TX: 154, next2Comp: 74, lastNext2TX: 76, next2Write:266, ringSize: 512 inFlight: 19, delay(ms): 12173,txStopped: 0
yyyy-mm-ddThh:mm:ss.sssZ cpu112:6665230)Vmxnet3: 21142: portID: 67108893, sop: 74 eop: 75 enableGen: 7 qid: 38, pkt: 0x45ba55ffe400
yyyy-mm-ddThh:mm:ss.sssZ cpu112:6665230)NetSched: 752: 0x84000017: received a force quiesce for port 0x400001d, dropped 727 pkts
yyyy-mm-ddThh:mm:ss.sssZ cpu112:6665230)NetPort: 1793: disabled port 0x400001d
yyyy-mm-ddThh:mm:ss.sssZ cpu112:6665230)Vmxnet3: 14060: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 2 port 0x400001d
yyyy-mm-ddThh:mm:ss.sssZ cpu112:6665230)Vmxnet3: 14327: Using default queue delivery for vmxnet3 for port 0x400001d

 

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on the environment.

Environment

vSphere ESXi 7.x

Cause

  • vNICs failed to respond.
  • Driver's attempt to revive by resetting the vNIC also failed.
  • As reset failed, vNIC stay in disabled state leading into network communication failure.

Resolution

Engage CISCO hardware vendor for further diagnostics and resolution.

Additional Information

Depending on which interfaces failed, the symptoms could vary.  If the interfaces for management fail, the result is expected to match: ESXi host hangs abruptly when using Cisco UCS VIC