ESXi 5.5/6.0.x host loses network connectivity with Broadcom 10 GB Nics and bnx2x driver loaded under heavy VXLAN traffic
search cancel

ESXi 5.5/6.0.x host loses network connectivity with Broadcom 10 GB Nics and bnx2x driver loaded under heavy VXLAN traffic

book

Article ID: 340036

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • ESXi host loses network connectivity on all nic bnx2x interfaces
  • ESXi host eventually becomes unresponsive
  • In the /var/log vmkernel logs, you see entries similar to:
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_attn_int_deasserted3:4821(vmnic2)]driver assert</time>
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_panic_dump:1139(vmnic2)]begin crash dump -----------------</time>
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_panic_dump:1149(vmnic2)]def_idx(0xf588) def_att_idx(0x4) attn_state(0x1) spq_prod_idx(0x8a) next_stats_cnt(0xf581)</time>
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_panic_dump:1154(vmnic2)]DSB: attn bits(0x0) ack(0x1) id(0x0) idx(0x4)</time>
<3>bnx2x: [bnx2x_panic_dump:1155(vmnic2)] def (0x0 0x0 0x0 0x0 0x0 0x0 0x0 0xff87 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0) <YYYY-MM-DD>T<time></time>
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_panic_dump:1206(vmnic2)]fp0: rx_bd_prod(0xe741) rx_bd_cons(0x342) rx_comp_prod(0x23eb) rx_comp_cons(0x1fe0) *rx_cons_sb(0x1fe0)</time>
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_panic_dump:1209(vmnic2)] rx_sge_prod(0x0) last_max_sge(0x0) fp_hc_idx(0x1aed)</time>
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_panic_dump:1226(vmnic2)]fp0: tx_pkt_prod(0x0) tx_pkt_cons(0x0) tx_bd_prod(0x0) tx_bd_cons(0x0) *tx_cons_sb(0x0)</time>
<3>bnx2x: [bnx2x_panic_dump:1237(vmnic2)] run indexes (0x1aed 0x0)<3>bnx2x: [bnx2x_panic_dump:1243(vmnic2)] indexes (0x0 0x1fe0 0x0 0x0 0x0 0x0 0x0 0x0)
<YYYY-MM-DD>T<time> cpu5:33507)pf_id(0x2) vf_id(0xff) vf_valid(0x0) vnic_id(0x1) same_igu_sb_1b(0x1) state(0x1)</time>

<YYYY-MM-DD>T<time> cpu5:33507)SM[0] __flags (0x0) igu_sb_id (0x25) igu_seg_id(0x0) time_to_expire (0x18b7ec) timer_value(0xff)</time>
<YYYY-MM-DD>T<time> cpu5:33507)SM[1] __flags (0x0) igu_sb_id (0x25) igu_seg_id(0x0) time_to_expire (0xffffffff) timer_value(0xff)</time>
<YYYY-MM-DD>T<time> cpu5:33507)INDEX[0] flags (0x0) timeout (0x0)</time>
<YYYY-MM-DD>T<time> cpu5:33507)INDEX[1] flags (0x2) timeout (0x6)</time>
<YYYY-MM-DD>T<time> cpu5:33507)INDEX[2] flags (0x0) timeout (0x0)</time>
<YYYY-MM-DD>T<time> cpu5:33507)INDEX[3] flags (0x0) timeout (0x0)</time>
<YYYY-MM-DD>T<time> cpu5:33507)INDEX[4] flags (0x1) timeout (0x0)</time>
<YYYY-MM-DD>T<time> cpu5:33507)INDEX[5] flags (0x3) timeout (0xc)</time>
<YYYY-MM-DD>T<time> cpu5:33507)INDEX[6] flags (0x3) timeout (0xc)</time>
<YYYY-MM-DD>T<time> cpu5:33507)INDEX[7] flags (0x3) timeout (0xc)</time>
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x 0000:01:00.2: vmnic2: bc 7.10.11</time>
<YYYY-MM-DD>T<time> cpu5:33507)<3>begin fw dump (mark 0x3c6610)</time>

<YYYY-MM-DD>T<time> cpu5:33507)<3>end of fw dump</time>
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_panic_dump:1402(vmnic2)]Idle check (1st round) ----------</time>
<5>[bnx2x_self_test_log:148(vmnic2)]INFO CFC: AC is neither 0 nor 2 on connType 0 (ETH). Values are 0x0 0x16
<5>[bnx2x_self_test_log:148(vmnic2)]INFO XCM: XX protection CAM is not empty.Value is 0x2
<5>[bnx2x_self_test_log:148(vmnic2)]INFO BRB1: BRB is not empty.Value is 0x14
<5>[bnx2x_self_test_log:151(vmnic2)]WARNING XCM: FIC0_INIT_CRD is not 64.Value is 0x2e
<5>[bnx2x_self_test_log:148(vmnic2)]INFO TCM: FIC0_INIT_CRD is not 64.Value is 0x22
<5>[bnx2x_self_test_log:148(vmnic2)]INFO PRS: TCM current credit is not 0.Value is 0x1e
<5>[bnx2x_self_test_log:151(vmnic2)]WARNING MISC: pcie_rst_b was asserted without perst assertion.Value is 0x1
<5>[bnx2x_self_test_log:151(vmnic2)]WARNING TSEM: interrupt 0 is active.Value is 0x10010000
<5>[bnx2x_self_test_log:151(vmnic2)]WARNING USEM: interrupt 0 is active.Value is 0x10000000
<5>[bnx2x_self_test_log:151(vmnic2)]WARNING XSEM: interrupt 0 is active.Value is 0x10010000
<5>[bnx2x_self_test_log:148(vmnic2)]INFO QM: VOQ_0, VOQ credit is not equal to initial credit. Values are 0x226 0x2ce
<5>[bnx2x_self_test_log:148(vmnic2)]INFO QM: Byte credit 0 is not equal to initial credit. Values are 0x3438 0x8000
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_panic_dump:1404(vmnic2)]Idle check (2nd round) ----------</time>
<5>[bnx2x_self_test_log:148(vmnic2)]INFO CFC: AC is neither 0 nor 2 on connType 0 (ETH). Values are 0x0 0x16
<5>[bnx2x_self_test_log:148(vmnic2)]INFO XCM: XX protection CAM is not empty.Value is 0x2
<5>[bnx2x_self_test_log:148(vmnic2)]INFO BRB1: BRB is not empty.Value is 0x16
<5>[bnx2x_self_test_log:151(vmnic2)]WARNING XCM: FIC0_INIT_CRD is not 64.Value is 0x2e
<5>[bnx2x_self_test_log:148(vmnic2)]INFO TCM: FIC0_INIT_CRD is not 64.Value is 0x22
<5>[bnx2x_self_test_log:148(vmnic2)]INFO PRS: TCM current credit is not 0.Value is 0x1e
<5>[bnx2x_self_test_log:151(vmnic2)]WARNING MISC: pcie_rst_b was asserted without perst assertion.Value is 0x1
<5>[bnx2x_self_test_log:151(vmnic2)]WARNING TSEM: interrupt 0 is active.Value is 0x10010000
<5>[bnx2x_self_test_log:151(vmnic2)]WARNING USEM: interrupt 0 is active.Value is 0x10000000
<5>[bnx2x_self_test_log:151(vmnic2)]WARNING XSEM: interrupt 0 is active.Value is 0x10010000
<5>[bnx2x_self_test_log:148(vmnic2)]INFO QM: VOQ_0, VOQ credit is not equal to initial credit. Values are 0x226 0x2ce
<5>[bnx2x_self_test_log:148(vmnic2)]INFO QM: Byte credit 0 is not equal to initial credit. Values are 0x3438 0x8000
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_mc_assert:936(vmnic2)]XSTORM_ASSERT_LIST_INDEX 0x2</time>
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_mc_assert:950(vmnic2)]XSTORM_ASSERT_INDEX 0x0 = 0x00020000 0x00010014 0x052305a8 0x00010053</time>
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_mc_assert:964(vmnic2)]Chip Revision: everest3, FW Version: 7_10_51</time>
<YYYY-MM-DD>T<time> cpu5:33507)<3>bnx2x: [bnx2x_panic_dump:1409(vmnic2)]end crash dump -----------------</time>
<YYYY-MM-DD>T<time> cpu5:33507)<5>bnx2x: [bnx2x_attn_int_deasserted:5652(vmnic2)]about to mask 0xfffffffe at IGU addr 0x442d10</time>
<YYYY-MM-DD>T<time> cpu5:33507)<5>bnx2x: [bnx2x_attn_int_deasserted:5665(vmnic2)]aeu_mask 1f6 newly deasserted 1</time>
<YYYY-MM-DD>T<time> cpu5:33507)<5>bnx2x: [bnx2x_attn_int_deasserted:5667(vmnic2)]new mask 1f7</time>
<YYYY-MM-DD>T<time> cpu5:33507)<5>bnx2x: [bnx2x_attn_int_deasserted:5672(vmnic2)]attn_state 1</time>
<YYYY-MM-DD>T<time> cpu5:33507)<5>bnx2x: [bnx2x_attn_int_deasserted:5674(vmnic2)]new state 0</time>
<YYYY-MM-DD>T<time> cpu17:33475)<3>bnx2x: [bnx2x_stats_update:1282(vmnic4)]storm stats were not updated for 3 times</time>
<YYYY-MM-DD>T<time> cpu17:33475)<3>bnx2x: [bnx2x_stats_update:1283(vmnic4)]driver assert</time>
<YYYY-MM-DD>T<time> cpu17:33475)<3>bnx2x: [bnx2x_panic_dump:1139(vmnic4)]begin crash dump -----------------</time>
<YYYY-MM-DD>T<time> cpu17:33475)<3>bnx2x: [bnx2x_panic_dump:1149(vmnic4)]def_idx(0xf57a) def_att_idx(0x2) attn_state(0x0) spq_prod_idx(0x7d) next_stats_cnt(0xf574)</time>

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware vSphere ESXi 5.5
VMware vSphere ESXi 5.1
VMware vSphere ESXi 6.0

Cause

This issue occurs when the guest virtual machine sends invalid metadata for TSO packets. The packet length is less than Maximum Segment Size (MSS), but the TSO bit is set. This causes the adapter and driver to go into a non-operational state.

Note: This issue occurs only with VXLAN configured and when there is heavy VXLAN traffic.

Resolution

To work around this issue if you are unable to upgrade the drivers, run this command and reboot the host for changes to apply:

# esxcfg-module -s "enable_vxlan_ofld=0" bnx2x


Additional Information

To be alerted when this document is updated, click the Subscribe to Article link in the Actions box. Poor network performance on Windows 2008 Server virtual machine
ESXi 5.5/6.0.x 主机失去与 Broadcom 10 GB 网卡的网络连接,并且 bnx2x 驱动程序的 VXLAN 流量负担过重
ESXi 5.5/6.0.x ホストが、大量の VXLAN トラフィック下でロードされた Broadcom 10 GB NIC および bnx2x ドライバとのネットワーク接続を失う