ESXi host goes into Not Responding state when doing IPMI related operations, logs show "IpmiIfcSdrReadRecordId: retry expired"
search cancel

ESXi host goes into Not Responding state when doing IPMI related operations, logs show "IpmiIfcSdrReadRecordId: retry expired"

book

Article ID: 317650

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • ESXi host goes into Not Responding when doing IPMI related operations
  • hostd service may crash
  • hostd.log may show the following:

[YYYY-MM-DDTHH:MM:SS] hostd [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: summary.runtime, ha-root-pool. Sent notification immediately.
IpmiIfcSdrReadRecordId: data length mismatch req=19,resp=8
[YYYY-MM-DDTHH:MM:SS] hostd [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: guest.disk, 1. Sent notification immediately.
IpmiIfcSdrReadRecordId: retry expired.
IpmiIfcSdrReadRecordId: sensor not found, record id: 2219

  •  vmkernel.log may show the following:

[YYYY-MM-DDTHH:MM:SS] cpu19:1166661)UserDump: 3024: hostd-worker: Dumping cartel 1166400 (from world 1166661) to file /var/core/hostd-worker-zdump.000 ...


Note: The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on the environment.

Environment

VMware vSphere ESXi 6.5.x
VMware vSphere ESXi 6.7.x
VMware vSphere ESXi 7.x

Cause

Every 90 seconds, the host will receive sensor data from the IPMI system for a hardware health check. If at the same time another IPMI specific operation is done, a race condition may occur.

Resolution

This issue is resolved in:

  • VMware vSphere ESXi 6.5 P06 ESXi650-202102001
  • VMware vSphere ESXi 6.7 P03 ESXi670-202008001
  • VMware vSphere ESXi 7.0.1 Update 1


Workaround:
1. Run command
$ /etc/init.d/hostd stop

2. Edit /etc/vmware/hostd/config.xml
     <cimsvc>
        <path>libcimsvc.so</path>
        <enabled>true</enabled>
     </cimsvc>

  change to <enabled>false</enabled>

3. Run command
$ /etc/init.d/hostd start

Alternatively, avoid performing the below operations:

  • Use 3rd party tools to fetch IPMI related operations such as Fru get, Fru list, sdr get, sdr list, sel clear, sel list, sel get
  • Run ESXCLI related commands using scripts to get IPMI operations such as esxcli hardware ipmi sel get

Additional Information

VMware Skyline Health Diagnostics for vSphere - FAQ (345059)

Impact/Risks:
1.Hardware health monitoring will stop working.
2.Sensor and SEL(System Event Log) data in vSphere Client and MOB( Managed Object Browser) won't be available.
3.esxcli commands to get sensor data will also not work.