ESXi host becomes unresponsive or can fail with PSOD (purple screen of death) with errors related to nenic - vnic_dev_cmd2
search cancel

ESXi host becomes unresponsive or can fail with PSOD (purple screen of death) with errors related to nenic - vnic_dev_cmd2

book

Article ID: 376811

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • The ESXi host becomes unresponsive on console requiring a reboot. Pressing  ALT+F12 displays vmkernel entries related to nenic.
  • In certain scenarios the ESXi host may only disconnect from vCenter leading to restart of vpxa/hostd service.
  • The ESXi host may fail with PSOD with below stacktrace

    cpu5:2097412)@BlueScreen: #PF Exception 14 in world 2097412:HELPER_UPLIN IP 0x4180275eb528 addr 0x435572589004 PTEs:0x800007c023;0x0;
    cpu5:2097412)Code start: 0x418026c00000 VMK uptime: 0:00:33:11.912
    cpu5:2097412)0x451ac821bcf0:[0x4180275eb528]_vnic_dev_cmd2@(nenic)#<None>+0x64 stack: 0x418041400080
    cpu5:2097412)0x451ac821bd60:[0x4180275ebc9f]vnic_dev_cmd@(nenic)#<None>+0x80 stack: 0x451ad711b5a0
    cpu5:2097412)0x451ac821bd90:[0x4180275ebf32]vnic_dev_stats_dump@(nenic)#<None>+0x4b stack: 0x31974f4
    cpu5:2097412)0x451ac821bdc0:[0x4180275ef982]enic_dev_stats_dump@(nenic)#<None>+0x27 stack: 0x430e71c00090
    cpu5:2097412)0x451ac821bde0:[0x4180275df715]enic_uplink_stats_get@(nenic)#<None>+0x16 stack: 0x1
    cpu5:2097412)0x451ac821be10:[0x418026ee1ac7]UplinkDev_ShimStatsGet@vmkernel#nover+0x24c stack: 0x418026ee1aba
    cpu5:2097412)0x451ac821be70:[0x418026e4c5c5]UplinkDeviceGetStatsAsyncCB@vmkernel#nover+0x7a stack: 0x0
    cpu5:2097412)0x451ac821bf00:[0x418026ed7439]UplinkAsyncProcessCallsHelperCB@vmkernel#nover+0x116 stack: 0x4308410fe570
    cpu5:2097412)0x451ac821bf30:[0x418026cead9a]HelperQueueFunc@vmkernel#nover+0x157 stack: 0x4308410fe0b8
    cpu5:2097412)0x451ac821bfe0:[0x418026f0eaa2]CpuSched_StartWorld@vmkernel#nover+0x77 stack: 0x0
  • In the vmkernel.log you will notice similar entries as below

yyyy-mm-ddThh:mm:ss.msZ cpu36:2097990)nenic: enic_get_vnic_config:67: [0000:62:00.0] vNIC <MAC addr> wq/rq 256/512 mtu 1500
yyyy-mm-ddThh:mm:ss.msZ cpu36:2097990)nenic: enic_get_vnic_config:87: [0000:62:00.0] vNIC csum tx/rx yes/yes, tso yes, vxlan no, rss no, netqueue no, intr mode any, type min timer 125 usec, loopback tag 0x000
yyyy-mm-ddThh:mm:ss.msZ cpu36:2097990)nenic: enic_get_res_counts:145: [0000:62:00.0] vNIC resources avail: wq 1 rq 1 cq 2 intr 4 
yyyy-mm-ddThh:mm:ss.msZ cpu36:2097990)VMK_PCI: 764: device 0000:62:00.0 allocated 4 MSIX interrupts
yyyy-mm-ddThh:mm:ss.msZ cpu36:2097990)nenic: enic_alloc_vnic_resources:163: [0000:62:00.0] vNIC resources used: wq 1 rq 1 cq 2 intr 4 intr mode MSI-X

  • Repeated warnings are noticed before the PSOD

^[[7myyyy-mm-ddThh:mm:ss.msZ cpu4:2097412)WARNING: nenic: _vnic_dev_cmd2:278: 0000:62:00.1: wq is full while issuing devcmd2 command 4, fetch index: 1, posted index: 2281779200^[[0m 
^[[7myyyy-mm-ddThh:mm:ss.msZ cpu5:2097412)WARNING: nenic: _vnic_dev_cmd2:278: 0000:62:00.1: wq is full while issuing devcmd2 command 4, fetch index: 1, posted index: 2281779200^[[0m
^[[7myyyy-mm-ddThh:mm:ss.msZ cpu5:2097412)WARNING: nenic: _vnic_dev_cmd2:278: 0000:62:00.1: wq is full while issuing devcmd2 command 4, fetch index: 1, posted index: 2281779200^[[0m
yyyy-mm-ddThh:mm:ss.msZ cpu70:2098134)DVFilter: 5963: Checking disconnected filters for timeouts
^[[7myyyy-mm-ddThh:mm:ss.msZ cpu5:2097412)WARNING: nenic: _vnic_dev_cmd2:278: 0000:62:00.1: wq is full while issuing devcmd2 command 4, fetch index: 1, posted index: 2281779200^[[0m
^[[7myyyy-mm-ddThh:mm:ss.msZ cpu5:2097412)WARNING: nenic: _vnic_dev_cmd2:278: 0000:62:00.1: wq is full while issuing devcmd2 command 4, fetch index: 1, posted index: 2281779200^[[0m
^[[7myyyy-mm-ddThh:mm:ss.msZ cpu5:2097412)WARNING: nenic: _vnic_dev_cmd2:278: 0000:62:00.1: wq is full while issuing devcmd2 command 4, fetch index: 1, posted index: 2281779200^[[0m

Cause

Cisco VIC adapter misconfiguration or a possible VIC adapter hardware issue.

Resolution

It is recommended to change the Cisco UCS VIC adapter configuration to use 8 rx queues with ring size 4096 and 8 tx queues with ring size 4096.

NOTE: Contact Cisco TAC support team before executing the recommendations and to  investigate the Cisco VIC adapter hardware.

Follow the below steps to change the configuration.

Log in to UCS Manager
• Navigate to Server > Policies > Adapter Policies and select "Eth Adapter Policy VMware"
• In main windows expand Resources adjust parameters:
o Transmit Queues: 
o Transmit Queue Ring Size: 
o Receive Queues: 
o Receive Queue Ring Size: 
o Completion Queues:
o Interupts:
• Expand Options and and adjust parameter:
o Receive Side Scaling (RSS): Enabled
• Click Save changes
• Click Yes on applying it to all ESXi hosts
• On a per host basis per cluster send host to maintenance mode and apply the pending activities in UCS Manager for that host