Restarting NEC Baseboard Management Controller (BMC) fails with a purple diagnostic screen or ESXi/ESX host becomes unresponsive
search cancel

Restarting NEC Baseboard Management Controller (BMC) fails with a purple diagnostic screen or ESXi/ESX host becomes unresponsive

book

Article ID: 328805

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
During a restart of a Baseboard Management Controller (BMC) of a NEC Express 5800 / A1080a, A1040a server you may experience one of these symptoms:
  • On an ESX 4.1 host, a purple diagnostic screen similar to:

    @BlueScreen: #GP Exception 13 in world 4376:usb @ 0x418029c15c43
    53:22:52:05.325 cpu35:4376)Code start: 0x418029c00000 VMK uptime: 53:22:52:05.325
    53:22:52:05.325 cpu35:4376)0x417f808c7c30:[0x418029c15c43]DLM_free@vmkernel:nover+0x10a stack: 0x4100a3802010
    53:22:52:05.326 cpu35:4376)0x417f808c7c60:[0x418029c2524b]Heap_Free@vmkernel:nover+0x7a stack: 0x4100a3802010
    53:22:52:05.326 cpu35:4376)0x417f808c7c80:[0x41802a0bb36d]hid_free_buffers@esx:nover+0x20 stack: 0x417f808c7cc0
    53:22:52:05.326 cpu35:4376)0x417f808c7cc0:[0x41802a0bb4c6]hid_disconnect@esx:nover+0x10d stack: 0x417f808c7cf0
    53:22:52:05.327 cpu35:4376)0x417f808c7cf0:[0x41802a09ccc3]usb_unbind_interface@esx:nover+0x4a stack: 0x418000000007
    53:22:52:05.327 cpu35:4376)0x417f808c7d40:[0x41802a037e34]__device_release_driver@esx:nover+0xeb stack: 0x4100bdc1b750
    53:22:52:05.327 cpu35:4376)0x417f808c7d60:[0x41802a038322]device_release_driver@esx:nover+0x41 stack: 0x4100bdc1b750
    53:22:52:05.328 cpu35:4376)0x417f808c7d80:[0x41802a0379e5]bus_remove_device@esx:nover+0x30 stack: 0x4100bdc07830
    53:22:52:05.328 cpu35:4376)0x417f808c7db0:[0x41802a035808]device_del@esx:nover+0x143 stack: 0xbdc035b0
    53:22:52:05.329 cpu35:4376)0x417f808c7de0:[0x41802a0a382f]usb_disable_device@esx:nover+0x8e stack: 0x417f808c7e20
    53:22:52:05.329 cpu35:4376)0x417f808c7e20:[0x41802a0a1347]usb_disconnect@esx:nover+0xae stack: 0x4100bdc039f0
    53:22:52:05.329 cpu35:4376)0x417f808c7f30:[0x41802a0a25ad]hub_thread@esx:nover+0x2c0 stack: 0x4100bf7ff810
    53:22:52:05.330 cpu35:4376)0x417f808c7f60:[0x41802a070a02]kthread@esx:nover+0x79 stack: 0x417f00000004
    53:22:52:05.330 cpu35:4376)0x417f808c7fa0:[0x41802a06e382]LinuxStartFunc@esx:nover+0x51 stack: 0x4
    53:22:52:05.331 cpu35:4376)0x417f808c7ff0:[0x418029c87ca7]vmkWorldFunc@vmkernel:nover+0x52 stack: 0x0
    53:22:52:05.331 cpu35:4376)0x417f808c7ff8:[0x0]<unknown> stack: 0x0
    53:22:52:05.340 cpu35:4376)FSbase:0x0 GSbase:0x418048c00000 kernelGSbase:0x0
  • On an ESXi host, a purple diagnostic screen similar to:

    Machine Check Exception; Bus and Interconnect; Originated Level 2 I/O Bus DataRead Timeout error. PCPU52 in world 16993:vmklinux_9:i
    System has encountered a Hardware Error - Please contact the hardware vendor


  • The host is unresponsive at the console and must be restarted
  • The host is disconnected in vCenter Server
  • Virtual machines running on the affected host(s) are unresponsive and not available on the network


Environment

VMware vSphere ESXi 5.5
VMware vSphere ESXi 5.0
VMware ESXi 4.0.x Embedded
VMware ESXi 4.1.x Installable
VMware ESXi 4.0.x Installable
VMware ESXi 4.1.x Embedded
VMware ESX 4.0.x
VMware vSphere ESXi 5.1
VMware ESX 4.1.x

Resolution

This is a known issue when restarting the NEC Baseboard Management Controller (BMC) on a NEC server while the host is running ESXi/ESX.

To prevent this issue, NEC documentation recommends that the BMC not be restarted on a live production system when running ESXi/ESX. Per the NEC documentation, avoid these BMC operations while the VMware hypervisor is running:
  • SP Reset
  • Removal of USB devices
  • Updating Management Firmware
  • Reconfiguration of BMC requiring a restart
  • Management Firmware Reset
The BMC firmware also reboots when flooding or broadcast storms are detected while configuring a network address for the Web Console.

For more information, see pages 275, 508-509, 573-574, and 713 of the NEC documentation, NEC Express5800 / A1080a , A1040a User's Guide.

Note: The preceding link was correct as of January 31, 2014. If you find the link is broken, provide feedback and a VMware employee will update the link.

Additional Information



NEC 製サーバの BMC (Baseboard Management Controller) の再起動が失敗し紫色の診断画面が表示される、または ESXi/ESX ホストが応答しなくなる