ESXi host unresponsiveness or lockup due to high System Management Interrupt (SMI) rates on HPE Gen11
search cancel

ESXi host unresponsiveness or lockup due to high System Management Interrupt (SMI) rates on HPE Gen11

book

Article ID: 431850

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • HPE Gen11 servers running VMware ESXi may experience intermittent hard lockups where the hypervisor becomes slow or completely unresponsive.
  • In some scenarios, the iLO GUI may also become unresponsive. Analysis of vmksummary.log reveals a significant and rapid increase in the numSMI counter prior to the failure. Hardware telemetry may show processor utilization spiking from 10% to 100% and remaining elevated until a power cycle is performed.
  • /var/run/log/vmksummary.log may show events similar to:

Vmksummary.log:

YYYY-MM-DDTHH:MM:SS.188Z In(14) heartbeat[7161093]: up 41d6h33m9s, 95 VMs; [[2114271 vmx 31191716kB] [6128350 vmx 33982964kB] [6128219 vmx 59612572kB]] [[7161090 sh 0%max]], [numSMI 4011]
YYYY-MM-DDTHH:MM:SS.257Z In(14) heartbeat[7166451]: up 41d7h33m9s, 94 VMs; [[2114271 vmx 31075356kB] [6128350 vmx 38065640kB] [6128219 vmx 59006908kB]] [[7166457 sh 0%max]], [numSMI 4027]
YYYY-MM-DDTHH:MM:SS.449Z In(14) heartbeat[7171166]: up 41d8h33m9s, 94 VMs; [[2114271 vmx 30888280kB] [6128350 vmx 38285208kB] [6128219 vmx 59301856kB]] [[7171165 sh 0%max]], [numSMI 4061]
YYYY-MM-DDTHH:MM:SS.555Z In(14) heartbeat[7175570]: up 41d9h33m9s, 94 VMs; [[2114271 vmx 31070688kB] [6128350 vmx 38591728kB] [6128219 vmx 59011576kB]] [[7175572 sh 0%max]], [numSMI 4099]
YYYY-MM-DDTHH:MM:SS.633Z In(14) heartbeat[7180120]: up 41d10h33m9s, 94 VMs; [[2114271 vmx 31393928kB] [6128350 vmx 38817868kB] [6128219 vmx 59446612kB]] [[7180123 sh 0%max]], [numSMI 4119]
YYYY-MM-DDTHH:MM:SS.800Z In(14) heartbeat[7185608]: up 41d11h33m9s, 94 VMs; [[2114271 vmx 31033876kB] [6128350 vmx 38090000kB] [6128219 vmx 56594536kB]] [[7185611 sh 14%max]], [numSMI 4135]
YYYY-MM-DDTHH:MM:SS.906Z In(14) heartbeat[7189703]: up 41d12h33m9s, 94 VMs; [[2114271 vmx 30927880kB] [6128350 vmx 37472524kB] [6128219 vmx 54781504kB]] [[7189704 sh 14%max]], [numSMI 4149]
YYYY-MM-DDTHH:MM:SS.059Z In(14) heartbeat[7194151]: up 41d13h33m9s, 96 VMs; [[2115271 vmx 31029976kB] [6128350 vmx 37334960kB] [6128219 vmx 56300084kB]] [[7194153 sh 0%max]], [numSMI 4179]
YYYY-MM-DDTHH:MM:SS.206Z In(14) heartbeat[7198317]: up 41d14h33m8s, 96 VMs; [[2114271 vmx 30954524kB] [6128350 vmx 36924752kB] [6128219 vmx 59270612kB]] [[7198322 sh 0%max]], [numSMI 4197]
YYYY-MM-DDTHH:MM:SS.282Z In(14) heartbeat[7203026]: up 41d15h33m8s, 89 VMs; [[2115271 vmx 31033740kB] [6128350 vmx 36786916kB] [6128219 vmx 57441872kB]] [[7203029 sh 0%max]], [numSMI 9021]       ### <-- !! SMIs started increasing.
YYYY-MM-DDTHH:MM:SS.438Z In(14) heartbeat[7206757]: up 41d16h33m8s, 89 VMs; [[2114271 vmx 31052356kB] [6128350 vmx 36097284kB] [6128219 vmx 58924152kB]] [[7206758 sh 0%max]], [numSMI 21033]
YYYY-MM-DDTHH:MM:SS.600Z In(14) heartbeat[7210448]: up 41d17h33m9s, 89 VMs; [[2115271 vmx 31039440kB] [6128350 vmx 36413124kB] [6128219 vmx 58509272kB]] [[7210450 sh 0%max]], [numSMI 30643]
YYYY-MM-DDTHH:MM:SS.749Z In(14) heartbeat[7214353]: up 41d18h33m9s, 91 VMs; [[2114271 vmx 30972060kB] [6128350 vmx 36718264kB] [6128219 vmx 57954724kB]] [[7214357 sh 0%max]], [numSMI 35453]
YYYY-MM-DDTHH:MM:SS.954Z In(14) heartbeat[7218501]: up 41d19h33m9s, 91 VMs; [[2114271 vmx 31296804kB] [6128350 vmx 36433496kB] [6128219 vmx 57714444kB]] [[7218496 sh 0%max]], [numSMI 44086]
YYYY-MM-DDTHH:MM:SS.115Z In(14) heartbeat[7223583]: up 41d20h33m9s, 94 VMs; [[2114271 vmx 31101220kB] [6128350 vmx 35991488kB] [6128219 vmx 59032608kB]] [[7223581 sh 0%max]], [numSMI 53025]
YYYY-MM-DDTHH:MM:SS.263Z In(14) heartbeat[7228308]: up 41d21h33m8s, 94 VMs; [[2114271 vmx 30991888kB] [6128350 vmx 35586728kB] [6128219 vmx 59749880kB]] [[7228303 sh 14%max]], [numSMI 66690]
YYYY-MM-DDTHH:MM:SS.377Z In(14) heartbeat[7233203]: up 41d22h33m8s, 94 VMs; [[2114271 vmx 31249972kB] [6128350 vmx 35153724kB] [6128219 vmx 58978056kB]] [[7233206 sh 3%max]], [numSMI 78702]
YYYY-MM-DDTHH:MM:SS.534Z In(14) heartbeat[7238461]: up 41d23h33m8s, 94 VMs; [[2114271 vmx 31431480kB] [6128350 vmx 34255496kB] [6128219 vmx 57305236kB]] [[7238462 sh 0%max]], [numSMI 88311]
YYYY-MM-DDTHH:MM:SS:00.646Z In(14) heartbeat[7243934]: up 42d0h33m8s, 92 VMs; [[2114271 vmx 31546940kB] [6128350 vmx 34501376kB] [6128219 vmx 59806644kB]] [[7243936 sh 0%max]], [numSMI 100622]
YYYY-MM-DDTHH:MM:SS:00.793Z In(14) heartbeat[7248976]: up 42d1h33m8s, 91 VMs; [[2114271 vmx 31363452kB] [6128350 vmx 33769472kB] [6128219 vmx 59546432kB]] [[7248959 sh 0%max]], [numSMI 112340]       ### <-- !! SMIs increased.
YYYY-MM-DDTHH:MM:SS.514Z No(13) bootstop[2104880]: Host has booted

 

Environment

VMware ESXi 8.X on HPE ProLiant Gen11 servers.

Cause

A firmware-level issue in the BIOS/ROM where the system receives System Management Interrupts (SMI) at an excessively high rate, specifically related to SPD DIMM service requests, causing the processor to remain in a high-priority interrupt level.

Resolution

Coordinate with the hardware vendor (HPE) to investigate further.

Additional Information

The article below may resolve the issue, however it is best to engage the hardware vendor to investigate further.


https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-a00156729en_us