VMNIC link may drop on Cisco UCS Host running certain nenic driver versions
search cancel

VMNIC link may drop on Cisco UCS Host running certain nenic driver versions

book

Article ID: 330229

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Cisco UCS hosts with ESXi 6.5 installed running nenic driver prior to 1.0.11.0 may experience loss of network connectivity.
  • VMNICs/pNICs tied to VXLAN VTEPs may lose network connectivity.
  • In UCS Manager, the VIFs will be in an Error Disabled state.
  • Bouncing the VIF from UCS or the vmnic from esxcli does not bring the link back up.
  • The following error stack is seen in /var/log/vmkernel.log on an affected ESXi host:
    cpu26:66568)WARNING: nenic: enic_queue_wq_cont:943: [000:010:00.0] Failed to get pkt sg elem
    cpu26:66568)WARNING: nenic: enic_tq_xmit_pkt:1208: [000:010:00.0] Drop packet!
    cpu26:66568)WARNING: nenic: enic_queue_wq_cont:943: [000:010:00.0] Failed to get pkt sg elem
    cpu26:66568)WARNING: nenic: enic_tq_xmit_pkt:1208: [000:010:00.0] Drop packet!
    cpu22:67077)WARNING: nenic: enic_log_q_error:137: [000:010:00.0] WQ[0] error_status 10
    cpu22:67077)WARNING: nenic: enic_isr_msix_err:188: [000:010:00.0] Scheduled soft reset to recover from error
    cpu49:66235)nenic: enic_soft_reset_helper:2395: [000:010:00.0] Resetting
    cpu49:66235)nenic: enic_soft_reset_helper:2410: [000:010:00.0] Reset completed
    cpu29:66322)nenic: enic_link_check:247: [000:010:00.0] Link DOWN
    cpu29:66322)netschedHClk: NetSchedHClkNotify:2892: vmnic5: link down notification

Environment

VMware NSX

Cause

This issue occurs due to a Cisco Bug with nenic drivers https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvf36545/?rfs=iqvred.

Note: It is possible for vmkernel to pass in packets with vmk_PktFrameLenGet() smaller than the sum of individual sg lengths. Drivers are supposed to properly handle such packets and program the accurate length to hardware descriptors. The issue occurs when the sg number is 1.

Resolution

This issue is fixed in nenic drivers 1.0.11.0 and later as noted in Cisco VIC Release Notes https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/release/notes/VIC/3-2/b_CiscoVIC_Drivers-RN-3-2.pdf.

CSCvf36545: This fix addresses the issue of a length mismatch between the VMWare packet frame length API function and the sum of the individual fragment length in cases of a single fragment. With this native ENIC driver, it now properly handles such packets.
 

Workaround:

Per the Cisco article CSCvf36545, the host can be rebooted to recover the uplinks from the Err-Disabled state. However, there's a possibility that the issue would re-occur during the next upgrade. Hence it is recommended to upgrade the drivers on the ESXi host to a version later than 1.0.11.0, prior to NSX upgrade.

Additional Information