Excessive hardware alarms may be triggered when sensors are reset to an unknown state.
search cancel

Excessive hardware alarms may be triggered when sensors are reset to an unknown state.

book

Article ID: 338058

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
In ESXi host 6.5 and later you may see events like below when wbem is not able get the hardware state.

The logs will be filled with events as follows:-

2020-04-02T07:20:52.877Z error hostd[2099863] [Originator@6876 sub=Default opID=4d3fd9c8-85-2e67 user=vpxuser:management] IpmiIfcSdrReadRecordId:record id: 3D, error 192. Try again... 
2020-04-02T07:20:52.877Z error hostd[2099863] [Originator@6876 sub=Default opID=4d3fd9c8-85-2e67 user=vpxuser:management] IpmiIfcSdrReadRecordId: retry expired. 
2020-04-02T07:20:53.021Z warning hostd[2099863] [Originator@6876 sub=Default opID=4d3fd9c8-85-2e67 user=vpxuser:management] IpmiIfcSensorGetReading: Sensor Number 0x87, failed send cc = 0xc0   
2020-04-02T07:20:53.021Z warning hostd[2099863] [Originator@6876 sub=Cimsvc opID=4d3fd9c8-85-2e67 user=vpxuser:management] Retrieve Health status failed, sensors reset to unknown state ==> All sensors get reset to Unknown. 
2020-04-02T07:20:53.021Z verbose hostd[2099863] [Originator@6876 sub=Default opID=4d3fd9c8-85-2e67 user=vpxuser:management] count_events: starting communication with bmc over ipmi driver ==> Loading SEL data from IPMI.
2020-04-02T07:20:53.021Z error hostd[2099863] [Originator@6876 sub=Default opID=4d3fd9c8-85-2e67 user=vpxuser:management] count_events: ipmi returned invalid data block: data_len: 1 ccode 192  ==> Failure due to IPMI node busy. 
2020-04-02T07:20:53.021Z error hostd[2099863] [Originator@6876 sub=Default opID=4d3fd9c8-85-2e67 user=vpxuser:management] sync_device_eventlog: communicate with bmc failed, no hardware sel data.


Environment

VMware vSphere ESXi 6.7
VMware vSphere ESXi 6.5

Cause

The events are generated because the error code returned from IPMI shows that queries had failed as IPMI node was busy. When the cimsvc fails to fetch the data, it resets all sensors to Unknown state.

Resolution

This issue has been resolved in VMware ESXi 6.7, Patch Release ESXi670-202008001

Workaround:
To workaround this issue apply one of the following options.
  • Disable wbem using the command the following command.
# 'esxcli system wbem set -e 0'. 
 
If wbem is disabled, numeric sensor data will be refreshed via LoadStatusFromIPMI() which does not reset the sensor states (e.g. to unknown) and will not be causing excessive false alarms for numeric sensors. If wbem is disabled, any queries (like getClass, getInstance, enumInstances etc.) to sfcbd will no longer work. 

VMware vSphere 6.5 can report IPMI data with or without wbem services running. The default is for wbem services off on new install.

How to disable or enable the CIM agent on the ESX/ESXi host (1025757)
https://kb.vmware.com/s/article/1025757
  • Disable CIMSVC plug-in from hostd. 
Disabling CIMSVC will prevent the plugin from polling IPMI for hardware health information. Health status data would be not be monitored or reported for any of the sensors. However as Wbem(sfcb) is running queries sucg as getClass, getInstance, enumInstances etc, will continue to work.

Ensure the host in maintenance mode before proceeding. 
  • SSH to the ESXi host. 
  • Run the following command to stop the hostd process. # /etc/init.d/hostd stop
  • Take a backup of the /etc/vmware/hostd/config.xml file. 
  • Edit the file and change the following value to false. 
<cimsvc>
           <path>libcimsvc.so</path>
           <enabled>true</enabled>
            </cimsvc>
  • Start the hostd process again. # /etc/init.d/hostd start


Additional Information

Impact/Risks:
Turning CIMSVC off will stop the plugin from polling for IPMI data and prevent reporting of hardware health information.