ESXi host hangs and the VMs auto powers off
search cancel

ESXi host hangs and the VMs auto powers off

book

Article ID: 388234

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • ESXi host is in not responding state.
  • Log signature gives the below.
    In(14) heartbeat[2105113]: up 0d0h21m2s, 68 VMs; [[2102714 vmx 37177592kB] [2100774 vmx 37682128kB] [2100772 vmx 41918908kB]] [], [numSMI 488]
    No(13) bootstop[2100223]: Host has booted
    In(14) heartbeat[2114144]: up 0d0h54m9s, 0 VMs; [[2097931 vmsyslogd 25084kB] [2099226 vpxa 53024kB] [2099076 hostd 120704kB]] [], [numSMI 984]
  • No logs captured at the time of the issue
  • Loss of connectivity like the snippet below.
    In(182) vmkernel: cpu22:2097581)<NMLX_INF> nmlx5_core: vmnic0: nmlx5_en_L2TableIndexAdd - (nmlx5_core_en_main.c:8779) Add 0:50:56:8:C6:9 to L2 table
    In(182) vmkernel: VMB: 65: Reserved 4 MPNs starting @ 0x4c4
  • Hardware error as seen below.
    Record Id   When    Calculate Days ago Event Type SEL Type  Sensor Number Message
    64   20xx-xx-xxTxx:xx:xx 0 111 (Unknown) 2 (System Event) xxx Assert + Memory Uncorrectable ECC
    120 20xx-xx-xxTxx:xx:xx 0 111 (Unknown) 2 (System Event) xxx Assert + Processor IERR

Environment

VMware vSphere ESXi 8.x

Cause

An IERR is a catastrophic error reported by the processor but generally caused by devices outside of the processor core (e.g., memory, PCIe) which is probably due to Memory Uncorrectable ECC which your Hardware vendor will know more about.

Resolution

Please engage with your hardware vendor to have this investigated.