Few NICs are down with error "[vmnicX : 0x45021b04c000] Failed to bring up link"
search cancel

Few NICs are down with error "[vmnicX : 0x45021b04c000] Failed to bring up link"

book

Article ID: 318658

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article is to provide the information to troubleshoot issue where everything is fine from the hardware end.
  • Few NICs are down with error "[vmnicX : 0x45021b04c000] Failed to bring up link"
  • Below are the logs you will see in vmkernel.log
    • 2019-12-12T07:20:16.859Z cpu9:2101844 opID=76c0261e)Uplink: 16283: Setting speed/duplex to (25000 FULL) on vmnic4.
      2019-12-12T07:20:16.859Z cpu3:2097412)bnxtnet: bnxtnet_linkstatus_set:2288: [vmnic4 : 0x45021b04c000] Bringing up link
      2019-12-12T07:20:16.972Z cpu3:2097412)WARNING: bnxtnet: alloc_rx_buffers:1441: [vmnic4 : 0x45021b04c000] Failed to allocate all, init'ed rx ring 12 with 2973/3069 pages only
      2019-12-12T07:20:16.976Z cpu3:2097412)WARNING: bnxtnet: bnxtnet_create_defq_group:279: [vmnic4 : 0x45021b04c000] failed to allocate queue group resource for defq
      2019-12-12T07:20:16.978Z cpu3:2097412)WARNING: bnxtnet: bnxtnet_uplink_activate_dev:2050: [vmnic4 : 0x45021b04c000] failed to create default queue (Out of memory)
      2019-12-12T07:20:16.978Z cpu3:2097412)WARNING: bnxtnet: bnxtnet_linkstatus_set:2291: [vmnic4 : 0x45021b04c000] Failed to bring up link
      2019-12-12T07:20:16.978Z cpu9:2101844 opID=76c0261e)Uplink: 16302: Wait for device vmnic4 async call failed.
      2019-12-12T07:24:33.719Z cpu31:2101845 opID=608326ea)Uplink: 16277: Setting link down on physical adapter vmnic4.
      2019-12-12T07:24:33.719Z cpu45:2097412)bnxtnet: bnxtnet_linkstatus_set:2259: [vmnic4 : 0x45021b04c000] Taking down link
      2019-12-12T07:24:41.921Z cpu50:2102458 opID=d2b86853)Uplink: 16283: Setting speed/duplex to (0 AUTO) on vmnic4.
      2019-12-12T07:24:41.921Z cpu37:2097412)bnxtnet: bnxtnet_linkstatus_set:2288: [vmnic4 : 0x45021b04c000] Bringing up link
      2019-12-12T07:24:42.017Z cpu66:2097412)WARNING: bnxtnet: alloc_rx_buffers:1441: [vmnic4 : 0x45021b04c000] Failed to allocate all, init'ed rx ring 12 with 2973/3069 pages only
      2019-12-12T07:24:42.021Z cpu66:2097412)WARNING: bnxtnet: bnxtnet_create_defq_group:279: [vmnic4 : 0x45021b04c000] failed to allocate queue group resource for defq
      2019-12-12T07:24:42.023Z cpu66:2097412)WARNING: bnxtnet: bnxtnet_uplink_activate_dev:2050: [vmnic4 : 0x45021b04c000] failed to create default queue (Out of memory)
      2019-12-12T07:24:42.023Z cpu66:2097412)WARNING: bnxtnet: bnxtnet_linkstatus_set:2291: [vmnic4 : 0x45021b04c000] Failed to bring up link
      2019-12-12T07:24:42.023Z cpu50:2102458 opID=d2b86853)Uplink: 16302: Wait for device vmnic4 async call failed.
  • vSAN Environments
    • vSAN nodes become network partitioned
    • Object storage policy status remains out of compliance due to delays in resync traffic completion.
    • Unresponsive vSAN nodes due to Storage IO processing delays
    • VM production affected due to Storage IO processing delays
 


Environment

VMware vSphere ESXi 6.7
VMware vSphere ESXi 6.5
VMware vSphere ESXi 6.0

Cause

The cause of the NIC to be down is due to the fact that the ESXI host has a default limit for how much the rx queues can grow i.e.  32 queues.

If the NICs are using MTU 9000 and are of higher bandwidth, this queue can be utilised with few NICs and any NICs after that will fail to get the queue and driver will mark it as down.

Previously the value of  netPagePoolLimitCap is 98304

98304 / 3069 = 32.  That's to say, if a host uses bnxtnet driver(as an example), it can only globally support 32 rx queues at most when jumbo frame is enabled no matter there are how many nic adapters.

Resolution

This value was changed in ESXi 7.0 u1

Workaround:

Since this is a default value you can increase the value by running the below commands, 


esxcli system settings kernel set -s netPagePoolLimitCap -v 1048576

The following commands can be used to verify the value, 

esxcli system settings kernel list -o netPagePoolLimitCap
Name                 Type    Configured  Runtime  Default  Description
-------------------  ------  ----------  -------  -------  -----------
netPagePoolLimitCap  uint32  1048576       1048576    98304    Maximum number of pages period for the packet page pool.


Reboot the ESXi host.

Or
You can reduce the MTU to 1500 if customer doesn't agree to implement the above commands.


Additional Information

Impact/Risks:
The NICs will be marked as down.