When running VMware ESX/ESXi 4.x and 5.x, you experience one or more of these symptoms:
NETDEV WATCHDOG
timeout messages:cpu2:4237)WARNING: LinNet: netdev_watchdog: NETDEV WATCHDOG: vmnic1: transmit timed out
cpu2:4237)<3>bnx2: <--- start FTQ dump on vmnic1 --->
cpu2:4237)<3>bnx2: vmnic1: BNX2_RV2P_PFTQ_CTL ffffffff
cpu2:4237)<3>bnx2: vmnic1: BNX2_RV2P_TFTQ_CTL ffffffff
cpu2:4237)<3>bnx2: vmnic1: BNX2_RV2P_MFTQ_CTL ffffffff
...
cpu8:4238)<3>bnx2: vmnic0: TPAT mode ffffffff state ffffffff evt_mask ffffffff pc ffffffff pc ffffffff instr ffffffff
cpu8:4238)<3>bnx2: vmnic0: RXP mode ffffffff state ffffffff evt_mask ffffffff pc ffffffff pc ffffffff instr ffffffff
cpu8:4238)<3>bnx2: vmnic0: COM mode ffffffff state ffffffff evt_mask ffffffff pc ffffffff pc ffffffff instr ffffffff
cpu8:4238)<3>bnx2: vmnic0: CP mode ffffffff state ffffffff evt_mask ffffffff pc ffffffff pc ffffffff instr ffffffff
cpu8:4238)<3>bnx2: <--- end FTQ dump on vmnic0 --->
cpu8:4238)<3>bnx2: vmnic0 DEBUG: intr_sem[0]
cpu8:4238)<3>bnx2: vmnic0 DEBUG: EMAC_TX_STATUS[ffffffff] RPM_MGMT_PKT_CTRL[ffffffff]
cpu8:4238)<3>bnx2: vmnic0 DEBUG: MCP_STATE_P0[ffffffff] MCP_STATE_P1[ffffffff]
cpu8:4238)<3>bnx2: vmnic0 DEBUG: HC_STATS_INTERRUPT_STATUS[ffffffff]
cpu8:4238)<3>bnx2: vmnic0 DEBUG: PBA[ffffffff]
Tx Ring Full
error. This is the key entry:This issue occurs when the IRQ balancer disables the Message Signaled Interrupt vector (MSI-X) during a chip reset.
The MSI-X vector gets remapped at the beginning of the Base Address Register (BAR). The driver attempts to disable the MSI, but the memory access bit is disabled instead.
The Broadcom bnx2 driver did not complete a chip reset correctly after some condition (eg, a transmission timeout). This results in corruption of the PCI configuration space, which can cause invalid address references (such as 0xffffffff), also seen in dump and logs.
This issue is observed in bnx2 driver version 2.0.7c.
To disable MSI:
Message-Signaled Interrupts (MSIs) were introduced in the PCI 2.2 (Peripheral Component Interconnect) specification and later as an alternative to line-based interrupts. In line-based interrupts, the device has a interrupt pin that it asserts when it needs to interrupt the CPU. Devices that use MSIs trigger an interrupt by writing an address to a particular control register in the interrupt controller. PCI 3.0 defines an extended form of MSI, called MSI-X, that enables greater programmability.
MSI increases the number of interrupts per card. In MSI-X the device can have up to 2048 interrupts per card, unlike the pin-based method where the device was limited to 4 interrupts per card.