An ESXi host might experience the following behavior when a generic hardware fault occurs:
The following categories are the severity of states that indicate required action to resolve with examples of the log entries below.
Processor IERRProcessor Thermal TripProcessor Configuration ErrorProcessor Machine Check ExceptionProcessor Correctable Machine CheckMemory Configuration ErrorMemory Uncorrectable ECCMemory Transition to CriticalMemory Critical OvertemperatureDrive Slot In Critical ArrayDrive Slot In Failed ArrayDrive Bay in Critical ArrayDrive Bay in Failed ArrayDrive Slot Drive FaultPCI PERRPCI SERRBus Correctable ErrorBus Uncorrectable ErrorBus Fatal ErrorAdd-in Card Install ErrorCable/Interconnect Transition to Critical from less severeSlot/Connector Transition to CriticalSlot/Connector Transition to Non-criticalFan Transition to Critical from less severeFan Transition to Off LineTemperature Lower Critical going lowTemperature Transition to Critical from less severeTemperature Transition to Non-recoverable from less severeTemperature Upper Critical going highVoltage Limit ExceededVoltage Transition to Critical from less severeThe following is an example of what the CIM diagnostic log might display:OMC_IpmiLogRecord.CreationClassName="OMC_IpmiLogRecord",LogCreationClassName="OMC_IpmiRecordLog",LogName="IPMI SEL",MessageTimestamp="20121205114249.000000+000",RecordID="1"RecordID = 1MessageTimestamp = (NULL)LogName = IPMI SELLogCreationClassName = OMC_IpmiRecordLogCreationClassName = OMC_IpmiLogRecordRecordFormat = *string CIM_Sensor.DeviceID*uint8[2] IPMI_RecordID*uint8 IPMI_RecordType*uint8[4] IPMI_Timestamp*uint8[2] IPMI_GeneratorID*uint8 IPMI_EvMRev*uint8 IPMI_SensorType*uint8 IPMI_SensorNumber*boolean IPMI_AssertionEvent*uint8 IPMI_EventType*uint8 IPMI_EventData1*uint8 IPMI_EventData2*uint8 IPMI_EventData3*uint32 IANA*RecordData = *114.0.32*1 0*2*57 51 191 80*32 0*4*16*114*false*111*2*255*255*1*ElementName = IPMI SELDescription = Assert + Voltage Transition to Critical from less severeCaption = Assert + Voltage Transition to Critical from less severePerceivedSeverity = (NULL)Locale = (NULL)InstanceID = (NULL)DataFormat = (NULL)
6.x, 7.x, 8.x
Contact the hardware vendor for support if further troubleshooting and assistance is needed.
The Intelligent Platform Management Interface (IPMI) defines standards on how monitoring and control of system subsystems. These standards are also used for monitoring elements such as temperatures, voltages, fans, bus errors, memory, and so on. This system provides a variety of alarm mechanisms when a system exceeds its tolerance levels.
For example, an error for a processor might be displayed actively but only while the error is active. The point of the logging mechanism is to determine if an error occurred in the past which can indicate that the host is still experiencing fault conditions and might not be reporting these faults.
This generally warrants more detailed investigation with the hardware vendor.