ESXi host reporting PSOD with "OSAL_HW_ERROR_OCCURRED@(qedentv)"
search cancel

ESXi host reporting PSOD with "OSAL_HW_ERROR_OCCURRED@(qedentv)"

book

Article ID: 395761

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 8.0

Issue/Introduction

  • ESXi host reporting random PSOD with the error as OSAL_HW_ERROR_OCCURRED for qedentv driver

  • The issue is temporarily recovered with reboot
  • /var/run/log/LogEfi.log (ESXi host):

Panic from another CPU (cpu 47, world xxxxxxxx): ip=0x4xxxxxxxxx randomOff=0x2xxxxxxxxx:qedentv: Panic [Error: 2]
Halting PCPU 47.YYYY-MM-DDTHH:MM:SS cpu44:xxxxxxxx)ESC[45mESC[33;1mVMware ESXi 7.0.3 [Releasebuild-<build number> x86_64]ESC[0m
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)qedentv: Panic [Error: 2]
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)cr0=0x8xxxxxxxxxcr2=0xaxxxxxxxxx cr3=0xaxxxxxxxxx cr4=0x1xxxxxxxxx
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)FMS=17/31/0 uCode=0x8xxxxxxxxx

YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)Code start: 0x4xxxxxxxx VMK uptime: HH:MM:SS
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)0x4xxxxxxxx:[0x4xxxxxxxx]PanicvPanicInt@vmkernel#nover+0x327 stack: 0x4xxxxxxxxx
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)0x4xxxxxxxx:[0x4xxxxxxxx]Panic_vPanic@vmkernel#nover+0x23 stack: 0x4xxxxxxxxx
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)0x4xxxxxxxx:[0x4xxxxxxxx]vmk_PanicWithModuleID@vmkernel#nover+0x41 stack: 0x4xxxxxxxxx
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)0x4xxxxxxxx:[0x4xxxxxxxx]OSAL_HW_ERROR_OCCURRED@(qedentv)#<None>+0x111 stack: 0x4xxxxxxxxx
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)0x4xxxxxxxx:[0x4xxxxxxxx]ecore_hw_err_notify@(qedentv)#<None>+0x4f stack: 0x40
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)0x4xxxxxxxx:[0x4xxxxxxxx]ecore_int_deassertion@(qedentv)#<None>+0x4cd stack: 0x5a30xxxxxxxxxxxx
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)0x4xxxxxxxx:[0x4xxxxxxxx]ecore_int_sp_dpc@(qedentv)#<None>+0x536 stack: 0x1

  • There may also be some temperature warnings in the VMkernel log for the nic - but the PSOD can still occur without these
  • /var/run/log/vmkernel.log:

YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)[qedentv_monitor_temperature:3960(vmnic2)]Temperature Warning !!!
 [sensor 0] sensor_location 1, threshold_high 0, critical 0, current_temp 75
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)[qedentv_monitor_temperature:3960(vmnic2)]Temperature Warning !!!
 [sensor 0] sensor_location 1, threshold_high 0, critical 0, current_temp 75
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)[qedentv_monitor_temperature:3960(vmnic2)]Temperature Warning !!!
 [sensor 0] sensor_location 1, threshold_high 0, critical 0, current_temp 75
YYYY-MM-DDTHH:MM:SS cpuxx:xxxxxxxx)[qedentv_monitor_temperature:3960(vmnic2)]Temperature Warning !!!
 [sensor 0] sensor_location 1, threshold_high 0, critical 0, current_temp 75

Cause

When fan-failure/over-temperature is detected, driver brings down the adapter to low power state. The same is part of the driver code where during uplink operations, vmkernel check is returning with error if the condition is true.

Resolution

Proceed to engage the hardware vendor support for upgrade of the qedentv driver for Network card.

This issue is resolved with the below versions,

  • vSphere 7.0 - qedentv 3.70.7.0 or higher
  • vSphere 8.0 - qedentv 3.71.7.0 or higher

Additional Information