ESXi PSOD (purple screen of death) #PF Exception 14 in world tq tcpip4 tcp_timer_keep
search cancel

ESXi PSOD (purple screen of death) #PF Exception 14 in world tq tcpip4 tcp_timer_keep

book

Article ID: 380520

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptom:
ESXi 8.0.3 may encounter a Purple Screen of Death (PSOD) during tcpip world operations.

This issue has been seen also in 8.0 versions prior to U3.


ESXi running 8.0.3 24022510 may encounter a PSOD (purple screen of death) with following:

The backtrace can sometimes be observed in following file: /var/run/log/LogEFI.log

YYYY-MM-DDTHH:MM:SS.888Z In(14) LogEFI: cpu14:2098398)VMware ESXi 8.0.3 [Releasebuild-24022510 x86_64]
YYYY-MM-DDTHH:MM:SS.953Z In(14) LogEFI[2098777]: #PF Exception 14 in world 2098398:tq:tcpip4 IP 0x420021b84d25 addr 0x134
YYYY-MM-DDTHH:MM:SS.953Z In(14) LogEFI[2098777]: PTEs:0x806a8bf023;0x806a8fb023;0x806a8dc023;0x0;
YYYY-MM-DDTHH:MM:SS.953Z In(14) LogEFI[2098777]:
YYYY-MM-DDTHH:MM:SS.953Z In(14) LogEFI[2098777]: Module(s) involved in panic: [tcpip4 Built on: Jun 11 2024]
YYYY-MM-DDTHH:MM:SS.889Z In(14) LogEFI: cpu14:2098398)cr0=0x8001003d cr2=0x134 cr3=0x60d000 cr4=0x14216c
YYYY-MM-DDTHH:MM:SS.889Z In(14) LogEFI: cpu14:2098398)FMS=06/6a/6 uCode=0xd0003e7
YYYY-MM-DDTHH:MM:SS.889Z In(14) LogEFI: cpu14:2098398)frame=0x4539a429bde0 ip=0x420021b84d25 err=0x2 rflags=0x10206
YYYY-MM-DDTHH:MM:SS.889Z In(14) LogEFI: cpu14:2098398)rax=0x0 rbx=0x41ffd440b860 rcx=0x0
YYYY-MM-DDTHH:MM:SS.889Z In(14) LogEFI: cpu14:2098398)rdx=0xf rbp=0x134 rsi=0x43162da120d0
YYYY-MM-DDTHH:MM:SS.890Z In(14) LogEFI: cpu14:2098398)rdi=0x134 r8=0x1 r9=0xffffffffffffffff
YYYY-MM-DDTHH:MM:SS.890Z In(14) LogEFI: cpu14:2098398)r10=0x0 r11=0xffffffffffffffff r12=0x43162dfd6f40
YYYY-MM-DDTHH:MM:SS.890Z In(14) LogEFI: cpu14:2098398)r13=0x43162da32aa0 r14=0x4 r15=0x134
YYYY-MM-DDTHH:MM:SS.953Z In(14) LogEFI[2098777]: *PCPU14:2098398/tq:tcpip4
YYYY-MM-DDTHH:MM:SS.953Z In(14) LogEFI[2098777]: PCPU 0: SVSVUVUVVVUSVSSSSSVSSSUVVUSSUVVVUVUVUVVVUVUVVVVVVVVVVVUVVVVUVSVV
YYYY-MM-DDTHH:MM:SS.890Z In(14) LogEFI: cpu14:2098398)Code start: 0x420021a00000 VMK uptime: 1:21:42:44.869
YYYY-MM-DDTHH:MM:SS.891Z In(14) LogEFI: cpu14:2098398)0x4539a429bea8:[0x420021b84d25]SPTryLockWork@vmkernel#nover+0x15 stack: 0x42002320955a
YYYY-MM-DDTHH:MM:SS.891Z In(14) LogEFI: cpu14:2098398)0x4539a429beb0:[0x4200231989e8]rw_try_wlock@(tcpip4)#<None>+0x39 stack: 0x4539a429bec0
YYYY-MM-DDTHH:MM:SS.891Z In(14) LogEFI: cpu14:2098398)0x4539a429bec0:[0x420023209559]tcp_timer_keep@(tcpip4)#<None>+0xba stack: 0xffff
YYYY-MM-DDTHH:MM:SS.891Z In(14) LogEFI: cpu14:2098398)0x4539a429bf10:[0x42002319185b]callout_timer@(tcpip4)#<None>+0x1a0 stack: 0x43162dc8cbd8
YYYY-MM-DDTHH:MM:SS.892Z In(14) LogEFI: cpu14:2098398)0x4539a429bf60:[0x420021a3a956]VmkTimerQueueWorldFunc@vmkernel#nover+0x38f stack: 0xffffffffffffffff
YYYY-MM-DDTHH:MM:SS.892Z In(14) LogEFI: cpu14:2098398)0x4539a429bfe0:[0x4200220d67b2]CpuSched_StartWorld@vmkernel#nover+0xbf stack: 0x0
YYYY-MM-DDTHH:MM:SS.892Z In(14) LogEFI: cpu14:2098398)0x4539a429c000:[0x420021b44c6f]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
YYYY-MM-DDTHH:MM:SS.898Z In(14) LogEFI: cpu14:2098398)base fs=0x0 gs=0x420043800000 Kgs=0x0

 
 

Environment

vSphere ESXi 8.0.x

Cause

Description: A rare race condition between the TCP control path during a disconnect operation and the keep alive timer might cause ESXi hosts of version 8.0 Update 3 to fail with a purple diagnostic screen. The issue occurs under specific network conditions when the timer and disconnect logic overlap.

Resolution

This issue has been fixed in 8.0U3e

Please see the Release Notes:

https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/vsphere/8-0/release-notes/esxi-update-and-patch-release-notes/vsphere-esxi-80u3e-release-notes.html 

PR 3466538: A race condition between the TCP control path and the keep alive timer might cause an ESXi host to fail with a purple diagnostic screen
A rare race condition between the TCP control path during a disconnect operation and the keep alive timer might cause ESXi hosts of version 8.0 Update 3 to fail with a purple diagnostic screen. The issue occurs under specific network conditions when the timer and disconnect logic overlap.
This issue is resolved in this release. The fix prevents the purple diagnostic screen failure.

Additional Information