ESXi 5.0 host experiences a purple diagnostic screen with the errors "Failed to ack TLB invalidate" or "no heartbeat" on HP servers with PCC support
search cancel

ESXi 5.0 host experiences a purple diagnostic screen with the errors "Failed to ack TLB invalidate" or "no heartbeat" on HP servers with PCC support

book

Article ID: 320051

calendar_today

Updated On:

Products

VMware

Issue/Introduction


Symptoms:
  • ESXi 5.0 host fails with a purple diagnostic screen
  • The purple diagnostic screen or core dump contains messages similar to:

    • PCPU 39 locked up. Failed to ack TLB invalidate (total of 1 locked up, PCPU(s): 39).
      0x41228efc7b88:[0x41800646cd62]Panic@vmkernel#nover+0xa9 stack: 0x41228efe5000
      0x41228efc7cb8:[0x4180064989af]TLBDoInvalidate@vmkernel#nover+0x45a stack: 0x41228efc7ce8


    • @BlueScreen: PCPU 0: no heartbeat, IPIs received (0/1).
      ...
      0x4122c27c7a68:[0x41800966cd62]Panic@vmkernel#nover+0xa9 stack: 0x4122c27c7a98
      0x4122c27c7ad8:[0x4180098d80ec]Heartbeat_DetectCPULockups@vmkernel#nover+0x2d3 stack: 0x0
      ...
      NMI: 1943: NMI IPI received. Was eip(base):ebp:cs [0x7eb2e(0x418009600000):0x4122c2307688:0x4010](Src 0x1, CPU140)
      Heartbeat: 618: PCPU 140 didn't have a heartbeat for 8 seconds. *may* be locked up


Cause

Some HP servers experience a situation where the PCC (Processor Clocking Control or Collaborative Power Control) communication between the VMware ESXi kernel (VMkernel) and the server BIOS does not function correctly.

As a result, one or more PCPUs may remain in SMM (System Management Mode) for many seconds. When the VMkernel notices a PCPU is not available for an extended period of time, a purple diagnostic screen occurs.

Resolution

This issue has been resolved as of ESXi 5.0 Update 2 as PCC is disabled by default. For more information, see VMware ESXi 5.0, Patch ESXi500-Update02: VMware ESXi 5.0 Complete Update 2 (2033751) and the ESXi 5.0 Update 2 Release Notes.


To work around this issue in versions prior to ESXi 5.0 U2, disable PCC manually.

To disable PCC:
  1. Connect to the ESXi host using the vSphere Client.
  2. Click the Configuration tab.
  3. In the Software menu, click Advanced Settings.
  4. Select vmkernel.
  5. Deselect the vmkernel.boot.usePCC option.
  6. Restart the host for the change to take effect.
For more information, see Configuring advanced options for ESXi/ESX (1038578).

Additional Information

To be alerted when this document is updated, click the Subscribe to Article link in the Actions box

For more information, see the HP Customer Advisory article c03543898.

Note: This is a specific case of a Failed to ack TLB invalidate based purple diagnostic screen. For more information about general cases:
If looking at the logs and searching the Knowledge Base does not reveal any additional error messages that would justify the outage, or if the error has not been documented within the Knowledge Base, collect diagnostic information from the VMware ESXi host and submit a Support Request.

For more information, see:

For more information, see ESXi hosts that use HP CRU driver fail with a purple diagnostic screen when ECC events occur (2001207).

Interpreting an ESX/ESXi host purple diagnostic screen
Collecting diagnostic information for VMware products
Understanding a "Failed to ack TLB invalidate" purple diagnostic screen
Configuring advanced options for ESXi/ESX
ESXi hosts that use HP CRU driver fail with a purple diagnostic screen when ECC events occur
How to file a Support Request in Customer Connect
VMware ESXi 5.0, Patch ESXi500-Update02: VMware ESXi 5.0 Complete Update 2
PPC をサポートする HP サーバで、「TLB の無効化の承認に失敗しました」あるいは「ハートビートがありません」というエラーが発生し、ESXi 5.0 ホストに紫色の診断画面が表示される
ESXi 5.0 主机出现紫色诊断屏幕(紫屏),并在具有 PCC 支持的 HP 服务器上出现“无法确认 TLB 是否失效”或“无检测信号”错误