ESXi host crashes with PSOD (purple screen of death) "Unexpected runqueue state encountered" on same PCPUs
search cancel

ESXi host crashes with PSOD (purple screen of death) "Unexpected runqueue state encountered" on same PCPUs

book

Article ID: 368557

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • ESXi host crashes with PSOD (purple screen of death) "Unexpected runqueue state encountered" on same PCPUs
  • The PSOD Stack contains one or more of the following strings:
    CpuSched_VcpuRunStateChange@vmkernel
    CpuSchedVcpuMakeReady@vmkernel
    CpuSchedWakeupWorld@vmkernel
    CpuSchedWakeupWorldInt@vmkernel
    CpuSchedActionNotifyTraditionalVcpuid@vmkernel
    CpuSched_ActionNotifyTraditionalVCPUSubset@vmkernel
    CpuSchedActionNotifyHierarchical@vmkernel
    CpuSched_ActionNotifyVCPUs@vmkernel
    VMMVMKCall_Call@vmkernel
    VMKVMM_ArchEnterVMKernel@vmkernel#
    cpu81:4001794)CpuSched: 5595: Unexpected runqueue state encountered!
    cpu80:2098638)CpuSched: 5595: Unexpected runqueue state encountered!
  • Prior to the crash vmkernel logged errors similar to the ones below repeatedly, as can be seen in /var/run/log/vmkernel.*:
    [YYYY-MM-DDTHH:MM:SS] cpuXX:4001794)ALERT: CpuSched: 5595: Unexpected runqueue state encountered!
    [YYYY-MM-DDTHH:MM:SS] cpuXX:2098638)ALERT: CpuSched: 5595: Unexpected runqueue state encountered!
    [YYYY-MM-DDTHH:MM:SS] cpuYY:4001794)ALERT: CpuSched: 5595: Unexpected runqueue state encountered!
    [YYYY-MM-DDTHH:MM:SS] cpuYY:4001794)CpuSched: 5595: Unexpected runqueue state encountered!
    [YYYY-MM-DDTHH:MM:SS] cpuXX:2098638)CpuSched: 5595: Unexpected runqueue state encountered!
    [YYYY-MM-DDTHH:MM:SS] cpuXX:4001794)CpuSched: 5595: Unexpected runqueue state encountered!
    [YYYY-MM-DDTHH:MM:SS] cpuYY:2098638)CpuSched: 5595: Unexpected runqueue state encountered!
    [YYYY-MM-DDTHH:MM:SS] cpuYY:2105937)CpuSched: 5595: Unexpected runqueue state encountered!

 



Environment

VMware vSphere ESXi 7.x

VMware vSphere ESXi 8.x

Cause

  • This is a hardware issue, usually caused by a faulty physical CPU or socket

 

Resolution

  • To confirm that specific core is faulty, simply swap the sockets and see if crashes happens again on the same Physical CPU after swap
  • As a reference, for a Host with 2 physical processors/packages/CPUs, each having 10 cores, CPU0 - CPU9 will belong to core 1, and CPU 10-19 will belong to core 2
  • Reach out to hardware vendor and perform a complete hardware diagnostics
  • If the crash happens again , on the same Physical core / CPU , this confirms that the physical CPU is at fault. Replace the Physical CPU