Understanding a "Failed to ack TLB invalidate" purple diagnostic screen
search cancel

Understanding a "Failed to ack TLB invalidate" purple diagnostic screen

book

Article ID: 324947

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

A purple diagnostic screen that reports information similar to:
  • PCPU 3 locked up. Failed to ack TLB invalidate.
    @BlueScreen: PCPU 3 locked up. Failed to ack TLB invalidate.

  • cpu34:9213)VMware ESXi 5.0.0 [Releasebuild-702118 x86_64] PCPU 18 locked up. Failed to ack TLB invalidate (total of 5 locked up, PCPU(s): 0,10,11,16,18).cpu34:9213)cr0=0x80010031 cr2=0x29bbd000 cr3=0x47aa000 cr4=0x2768

Resolution

Overview

  • Context – A context is a collection of CPU specific information that pertains to a specific process. The context includes the values of the CPU registers and memory management information.
  • Context switch – A context switch occurs when an interrupt occurs. The system saves the context and restores the context of another process.
  • Translation Look-aside Buffer (TLB) – The TLB is a table of keys and values that improve the performance of addressing virtual memory. This is part of the memory management information included in the context.
When an interrupt occurs, a context switch must be performed. Prior to loading a new context and loading a new TLB, the current TLB needs to be flushed or invalidated. This type of purple diagnostic screen occurs when the physical CPU does not perform this flush for a prolonged period of time.


Diagnostic Information

This is an example of the diagnostic information that is included in the purple diagnostic screen:
VMware ESX Server [Releasebuild-52542]
PCPU 3 locked up. Failed to ack TLB invalidate.
gate=0x0 frame=0x343bd78 eip=0x61fafc cr2=0x0 cr3=0x13a83000 cr4=0x16c
eax=0x0 ebx=0x0 ecx=0x0 edx=0x0 es=0x0 ds=0x0
fs=0x0 gs=0x0 ebp=0x343bed4 esi=0x0 edi=0x0 err=0 ef=0x0
cpu 0 2673 vmm0:keys: cpu 1 2372 mks:dc02: CPU 2 1038 helper1-3: cpu 3 3012 vmm0:erpt:
cpu 4 3019 vmm0:keys: cpu 5 2652 vmm0:erpt: cpu 6 2832 vmm0:time: cpu 7 2394 vmm0:addc:
@BlueScreen: PCPU 3 locked up. Failed to ack TLB invalidate.
0x343bed4:[0x61fafc]_vLog+0x0(0x78cb60, 0x343bef0, 0x343bf10)
0x343bee4:[0x61fafc]_vLog+0x0(0x78cb60, 0x3, 0x1)
0x343bf10:[0x63fd00]TLBInvalidateFailed+0x90(0x1, 0xffffffff, 0x0)
0x343bf38:[0x640012]TLBDoInvalidate+0x27a(0xffffffff, 0xffffffff, 0x343bf74)
0x343bf48:[0x63fbb5]TLB_Flush+0x35(0x0, 0x0, 0x400)
0x343bf74:[0x65d878]XMapFlushDelayedUnmaps+0x70(0x0, 0x12130b4, 0x0)
0x343bfac:[0x6463e3]helpFunc+0x1ff(0x1, 0xc9256c, 0x0)
0x343bffc:[0x702bb8]CpuSched_StartWorld+0x11c(0x0, 0x0, 0x0)
0x343c000:[0x0](0x0, 0x0, 0x0)
VMK uptime: 210:15:14:32.718 TSC: 47315535316217757
cpu5:2602)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 3781 seconds. *may* be locked up
cpu5:2659)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 7621 seconds. *may* be locked up
cpu5:2644)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 15301 seconds. *may* be locked up
Starting coredump to disk Starting coredump to disk Dumping using slot 1 of 1... using slot 1 of 1... log
From the preceding example:
  • Identify the physical CPU that is misbehaving. In the this example, it is physical CPU 3:

    PCPU 3 locked up.

  • Length of time the system waited for the PCPU to invalidate the TLB:

    cpu5:2602)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 3781 seconds. *may* be locked up
    cpu5:2659)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 7621 seconds. *may* be locked up
    cpu5:2644)Heartbeat: 469: PCPU 3 didn't have a heartbeat for 15301 seconds. *may* be locked up
Extract the ESXi host logs that led to the purple diagnostic screen and examine it for a potential cause. To extract the logs, see Extracting the log file after an ESX or ESXi host fails with a purple screen error (1006796).
 
The Failed to ack TLB Invalidate is caused by either a hardware or a software issue.