ESXi PSODs after host initialization due to Mellanox firmware error "synd 0x1: firmware internal error" and "extSynd 0x0554"
search cancel

ESXi PSODs after host initialization due to Mellanox firmware error "synd 0x1: firmware internal error" and "extSynd 0x0554"

book

Article ID: 439187

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • The ESXi host PSODs with following stack.

#0  FastSlabDrainFreshObjsToBuffer (freshObjs=freshObjs@entry=0x43033df03190, buffer=buffer@entry=0x45da9d800000) at bora/vmkernel/main/fastslab.c:1948
#1  0x0000420032935049 in FastSlabFreeObjToBuffer (slab=slab@entry=0x43033df02f40, node=node@entry=2, obj=<optimized out>, obj@entry=0x45da9d9ffa00)
    at bora/vmkernel/main/fastslab.c:2017
#2  0x0000420032935659 in FastSlabReleaseMagazineToBuffer (slab=slab@entry=0x43033df02f40, magazine=0x0, initialized=1 '\001') at bora/vmkernel/main/fastslab.c:2082
#3  0x00004200329357b1 in FastSlabNodeFlush (slab=slab@entry=0x43033df02f40, memoryIsCritical=memoryIsCritical@entry=0 '\000') at bora/vmkernel/main/fastslab.c:4621
#4  0x0000420032935881 in FastSlabNodeFlushWorld (dummy=<optimized out>) at bora/vmkernel/main/fastslab.c:4771
#5  0x0000420032edc88f in CpuSched_StartWorld (destWorld=<optimized out>, previous=<optimized out>) at bora/vmkernel/sched/cpusched.c:15324
#6  0x00004200329453b0 in ?? () at bora/vmkernel/main/debug.c:4125
#7  0x0000000000000000 in ?? ()

  • The /var/run/log/vmkernel.log reports the following errors before the PSOD on the host. 

YYYY:MM:DDTHH:MM:SS.Z cpu27:2098012)<NMLX_INF> synd 0x1: firmware internal error
YYYY:MM:DDTHH:MM:SS.Z cpu27:2098012)<NMLX_INF> extSynd 0x0554
YYYY:MM:DDTHH:MM:SS.Z cpu35:2098046)<NMLX_INF> synd 0x1: firmware internal error
YYYY:MM:DDTHH:MM:SS.Z cpu35:2098046)<NMLX_INF> extSynd 0x0554
YYYY:MM:DDTHH:MM:SS.Z cpu10:2097389)<NMLX_ERR> nmlx5_core: vmnic1: nmlx5_en_UplinkMTUSet - (nmlx5_core_en_uplink.c:5002) done  status: IO was aborted
YYYY:MM:DDTHH:MM:SS.Z cpu10:2097389)<NMLX_ERR> nmlx5_core: vmnic0: nmlx5_en_UplinkMTUSet

  • The firmware and driver validated to be compatible as per the BCG. 

Environment

ESXi 8.0.3 

Cause

The Mellanox Nic encounters an internal firmware error (Fw extSynd 0x0554) during initialization, causing the vmkernel to abort uplink initialization and trigger a kernel exception in the FastSlab allocator.

 

Resolution

This issue is caused by a failure at the hardware/firmware layer.

  1. Contact server hardware vendor to provide the following details of the failure: 
    • Codes: synd 0x1 and extSynd 0x0554
  2. Request a firmware update or hardware replacement plan based on the OEM's root cause analysis of the 0x0554 extended syndrome

Additional Information

The Mellanox card that faced the above impact. 

driver      driver version  firmware version  MAC address        VID   DID   SVID  SDID  name
------      --------------  ----------------  -----------        ---   ---   ----  ----  -----------------------------------
nmlx5_core  4.23.6.5        14.32.1010        ##:##:##:##:##:#  15b3  1015  1590  00d3  Mellanox Technologies MT27710 Family [ConnectX-4 Lx]