VMware ESXi hosts running on HP ProLiant Servers with outdated iLO firmware fail with #PF or #GP purple diagnostic screens
search cancel

VMware ESXi hosts running on HP ProLiant Servers with outdated iLO firmware fail with #PF or #GP purple diagnostic screens

book

Article ID: 343421

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

When using an HP ProLiant server with iLO 4 firmware earlier than 2.10 (March 2015), or iLO 3 firmware older than 1.82 (Jan 2015), you experience these events:

  • In the iLO Event Log, you see Power restored to iLO after approximately 30-40 days of iLO uptime. This event also coincides with a host purple diagnostic screen event.

  • Your VMware ESXi host fails with a purple diagnostic screen, which contains stack traces relating to memory operations, Exceptions 13 (#GP) and Exceptions 14 (#PF). While memory-related, this may still occur despite successful hardware diagnostics and without logged hardware failures, errors, or events.

  • The #GP or #PF purple diagnostic screen may contain entries similar, but not limited, to:

    @BlueScreen: #PF Exception 14 in world 33256:memMap-58 IP 0x418020830794 addr 0x4109b006e84c
    PTEs:0x101000023;0x8000008051ba3063;0x0;
    0x412447a1d5b0:[0x418020830794]BuddyRemoveFreeBuf@vmkernel#nover+0xc8 stack: 0x1
    0x412447a1d610:[0x41802083152f]BuddyBufFreeInt@vmkernel#nover+0x247 stack: 0xb3655d80001fffff
    0x412447a1d680:[0x418020831c00]BuddyFreeWithRetireLocked@vmkernel#nover+0x274 stack: 0x41000640aa80
    0x412447a1d6e0:[0x418020833015]Buddy_FreeAllWithRetire@vmkernel#nover+0x9d stack: 0x3
    0x412447a1d740:[0x41802087d5e2]MemMapFreeAndAccountPages@vmkernel#nover+0x22e stack: 0x0
    0x412447a1dfd0:[0x41802081be65]PageCacheAdjustSize@vmkernel#nover+0x3a1 stack: 0x0
    0x412447a1dff0:[0x418020a55692]CpuSched_StartWorld@vmkernel#nover+0xfa stack: 0x

    or

    @BlueScreen: #GP Exception 13 in world 8457579:vmm0:DLPXDBD @ 0x418037703158
    0x412475addf18:[0x418037703158]LPage_SelectLPageToRemap@vmkernel#nover+0x1e8 stack: 0x100412475addf
    0x412475addf98:[0x418037746084]VmMemRemap_BatchPickup@vmkernel#nover+0x304 stack: 0x418000000000
    0x412475addfe8:[0x4180376ceb40]VMMVMKCall_Call@vmkernel#nover+0x48c stack: 0x0
    0x4180376ce164:[0xfffffffffc223baa]__param_heap_initial@<None>#<None>+0xc3bfd282 stack: 0x0

    Note: As the issue can occur in multiple memory regions, stack trace contents can vary greatly.


Environment

VMware vSphere ESXi 5.0
VMware ESXi 4.1.x Embedded
VMware vSphere ESXi 5.5
VMware vSphere ESXi 5.1
VMware ESXi 4.1.x Installable
VMware vSphere ESXi 6.0

Cause

This issue occurs due to an iLO kernel crash after approximately 40 or more days of uptime. When the crash occurs, MMU contents can be affected, causing a wide variety of potential purple diagnostic screen events to occur.

Note: HP iLO 3 and iLO 4 are affected.

Resolution

This is not a VMware issue. VMware recommends following HP’s advisories and upgrade recommendations to resolve this issue:
To work around the issue, reboot the HP iLO every two weeks in order to avoid the iLO kernel crash.

Note: The preceding links were correct as of December 7, 2015. If you find a link is broken, provide feedback and a VMware employee will update the link.