An ESXi host might experience the following behavior when a generic hardware fault occurs:
The following categories are the severity of states that indicate required action to resolve with examples of the log entries below.
Processor IERR
Processor Thermal Trip
Processor Configuration Error
Processor Machine Check Exception
Processor Correctable Machine Check
Memory Configuration Error
Memory Uncorrectable ECC
Memory Transition to Critical
Memory Critical Overtemperature
Drive Slot In Critical Array
Drive Slot In Failed Array
Drive Bay in Critical Array
Drive Bay in Failed Array
Drive Slot Drive Fault
PCI PERR
PCI SERR
Bus Correctable Error
Bus Uncorrectable Error
Bus Fatal Error
Add-in Card Install Error
Cable/Interconnect Transition to Critical from less severe
Slot/Connector Transition to Critical
Slot/Connector Transition to Non-critical
Fan Transition to Critical from less severe
Fan Transition to Off Line
Temperature Lower Critical going low
Temperature Transition to Critical from less severe
Temperature Transition to Non-recoverable from less severe
Temperature Upper Critical going high
Voltage Limit Exceeded
Voltage Transition to Critical from less severe
The following is an example of what the CIM diagnostic log might display:OMC_IpmiLogRecord.CreationClassName="OMC_IpmiLogRecord",LogCreationClassName="OMC_IpmiRecordLog",LogName="IPMI SEL",MessageTimestamp="20121205114249.000000+000",RecordID="1"
RecordID = 1
MessageTimestamp = (NULL)
LogName = IPMI SEL
LogCreationClassName = OMC_IpmiRecordLog
CreationClassName = OMC_IpmiLogRecord
RecordFormat = *string CIM_Sensor.DeviceID*uint8[2] IPMI_RecordID*uint8 IPMI_RecordType*uint8[4] IPMI_Timestamp*uint8[2] IPMI_GeneratorID*uint8 IPMI_EvMRev*uint8 IPMI_SensorType*uint8 IPMI_SensorNumber*boolean IPMI_AssertionEvent*uint8 IPMI_EventType*uint8 IPMI_EventData1*uint8 IPMI_EventData2*uint8 IPMI_EventData3*uint32 IANA*
RecordData = *114.0.32*1 0*2*57 51 191 80*32 0*4*16*114*false*111*2*255*255*1*
ElementName = IPMI SEL
Description = Assert + Voltage Transition to Critical from less severe
Caption = Assert + Voltage Transition to Critical from less severe
PerceivedSeverity = (NULL)
Locale = (NULL)
InstanceID = (NULL)
DataFormat = (NULL)
6.x, 7.x, 8.x
Contact the hardware vendor for support if further troubleshooting and assistance is needed.
The Intelligent Platform Management Interface (IPMI) defines standards on how monitoring and control of system subsystems. These standards are also used for monitoring elements such as temperatures, voltages, fans, bus errors, memory, and so on. This system provides a variety of alarm mechanisms when a system exceeds its tolerance levels.
For example, an error for a processor might be displayed actively but only while the error is active. The point of the logging mechanism is to determine if an error occurred in the past which can indicate that the host is still experiencing fault conditions and might not be reporting these faults.
This generally warrants more detailed investigation with the hardware vendor.