ESX/ESXi host loses network connectivity with a Broadcom bnx2 driver FTQ dump
search cancel

ESX/ESXi host loses network connectivity with a Broadcom bnx2 driver FTQ dump

book

Article ID: 330413

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article describes a specific condition. If you observe an FTQ Dump without the Tx Ring Full error also being logged, the workaround and fix described may not be applicable.
If you observe a loss of network connectivity to an ESX/ESXi host without these symptoms, see Troubleshooting an unresponsive host and multiple Disconnected virtual machines (1019082).


Symptoms:

When running VMware ESX/ESXi 4.x and 5.x, you experience one or more of these symptoms:

  • There is a loss of network connectivity to an ESX/ESXi host.
  • The ESX/ESXi host remains up and responsive to console access.
  • VMware High Availability may have relocated virtual machines to alternate ESX/ESXi hosts per Isolation Response.

    Note: For more information, see VMware High Availability host isolation response types (1030320).

  • The VMkernel logs contain NETDEV WATCHDOG timeout messages:

    cpu2:4237)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out

  • The VMkernel logs contain a FTQ (Flow-through-queue) dump for the bnx2 driver:

    cpu2:4237)<3>bnx2: <--- start FTQ dump on vmnic1 --->
    cpu2:4237)<3>bnx2: vmnic1: BNX2_RV2P_PFTQ_CTL ffffffff
    cpu2:4237)<3>bnx2: vmnic1: BNX2_RV2P_TFTQ_CTL ffffffff
    cpu2:4237)<3>bnx2: vmnic1: BNX2_RV2P_MFTQ_CTL ffffffff
    ...
    cpu8:4238)<3>bnx2: vmnic0: TPAT mode ffffffff state ffffffff evt_mask ffffffff pc ffffffff pc ffffffff instr ffffffff
    cpu8:4238)<3>bnx2: vmnic0: RXP mode ffffffff state ffffffff evt_mask ffffffff pc ffffffff pc ffffffff instr ffffffff
    cpu8:4238)<3>bnx2: vmnic0: COM mode ffffffff state ffffffff evt_mask ffffffff pc ffffffff pc ffffffff instr ffffffff
    cpu8:4238)<3>bnx2: vmnic0: CP mode ffffffff state ffffffff evt_mask ffffffff pc ffffffff pc ffffffff instr ffffffff
    cpu8:4238)<3>bnx2: <--- end FTQ dump on vmnic0 --->
    cpu8:4238)<3>bnx2: vmnic0 DEBUG: intr_sem[0]
    cpu8:4238)<3>bnx2: vmnic0 DEBUG: EMAC_TX_STATUS[ffffffff] RPM_MGMT_PKT_CTRL[ffffffff]
    cpu8:4238)<3>bnx2: vmnic0 DEBUG: MCP_STATE_P0[ffffffff] MCP_STATE_P1[ffffffff]
    cpu8:4238)<3>bnx2: vmnic0 DEBUG: HC_STATS_INTERRUPT_STATUS[ffffffff]
    cpu8:4238)<3>bnx2: vmnic0 DEBUG: PBA[ffffffff]


  • The VMkernel logs show a Tx Ring Full error. This is the key entry:

    cpu8:4233)<3>bnx2: Chip not in correct endian mode
    cpu8:4233)<3>bnx2: vmnic0: BUG! Tx ring full when queue awake!
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware vSphere ESXi 5.1

Resolution

This issue occurs when the IRQ balancer disables the Message Signaled Interrupt vector (MSI-X) during a chip reset.

The MSI-X vector gets remapped at the beginning of the Base Address Register (BAR). The driver attempts to disable the MSI, but the memory access bit is disabled instead.

The Broadcom bnx2 driver did not complete a chip reset correctly after some condition (eg, a transmission timeout). This results in corruption of the PCI configuration space, which can cause invalid address references (such as 0xffffffff), also seen in dump and logs.

This issue is observed in bnx2 driver version 2.0.7c.

This issue is resolved in the following asynchronous Broadcom driver releases:
  • ESX/ESXi 4.0 – Broadcom driver version 2.1.5d.v40.1
  • ESX/ESXi 4.1 – Broadcom driver version 2.1.5d.v40.1
To resolve this issue, ensure that your ESX/ESXi host has one of these driver version installed. To download the latest Broadcom NetXtreme II Ethernet Network Controller driver version, see the VMware Downloads.
To workaround this issue, disable MSI support in the Broadcom bnx2 driver. This causes the driver to fall back to the PIN-IRQ assertion method of raising an interrupt.

To disable MSI:

  1. Log in to the ESX/ESXi host's terminal directly or through SSH. For additional information, see Connecting to an ESX host using an SSH client (1019852).
  2. Reconfigure the driver module using this command:

    esxcfg-module -s 'disable_msi=1' bnx2

  3. Reboot the server. The changes are loaded next time the module loads.
  4. After the ESX/ESXi host has finished booting, verify that disable_msi is set by running the command:

    esxcfg-module -g bnx2
For more information on setting parameters for loadable modules, see Configuring advanced driver module parameters in ESX/ESXi (1017588).


Additional Information

Message-Signaled Interrupts (MSIs) were introduced in the PCI 2.2 (Peripheral Component Interconnect) specification and later as an alternative to line-based interrupts. In line-based interrupts, the device has a interrupt pin that it asserts when it needs to interrupt the CPU. Devices that use MSIs trigger an interrupt by writing an address to a particular control register in the interrupt controller. PCI 3.0 defines an extended form of MSI, called MSI-X, that enables greater programmability.

MSI increases the number of interrupts per card. In MSI-X the device can have up to 2048 interrupts per card, unlike the pin-based method where the device was limited to 4 interrupts per card.

Configuring advanced driver module parameters in ESX/ESXi
ESX/ESXi hosts do not respond and is grayed out
Connecting to an ESX host using an SSH client
VMware High Availability host isolation response types
ESX/ESXi ホストがネットワーク接続を失い、Broadcom bnx2 ドライバ の FTQ ダンプが出力される
ESX/ESXi 主机失去与 Broadcom bnx2 驱动程序 FTQ 转储的网络连接