An ESXi host on a Lenovo ThinkSystem server reboots unexpectedly.
No Purple Diagnostic Screen (PSOD) is observed at the console.
The pre-allocated kernel dump file on the host carries a modification time from long before the event, confirming no PSOD dump was written for the event.
In /var/log/vmkernel.log and /var/log/hostd.log, log entries stop abruptly without any shutdown sequence, PSOD stack trace, or graceful exit messages. After a gap of several minutes, the next entries are the new boot session, similar to:
YYYY-MM-DDTHH:MM:SS.SSSZ vmkernel: VMB: 65 Reserved 4 MPNs starting @ 0x4c4 ... YYYY-MM-DDTHH:MM:SS.SSSZ bootstop: Host has booted
In /var/log/vmksummary.log, only a Host has booted line appears for the event, with no preceding Host is halting or Host is rebooting line.
The IPMI System Event Log (SEL) is clean for the event window. There is no Power Unit Power-off/down Assert/Deassert pair, no Power Supply AC-lost event, and no NMI, MCE, or watchdog entry.
In the Lenovo XCC (XClarity Controller) audit log, the following events are present within the event window, similar to:
FQXSPPP4042I Management Controller SN# <bmc-serial> reset was initiated due to Power-On-Reset. FQXSPPP4029I The server was automatically powered on because the power restore policy is set to restore previous power state.
The XCC audit log contains no FQXSPPP4035I (server powered off by chassis control command) or FQXSPPP4023I (server restarted by chassis control command) entries for the event window.
A hidden log forensic snapshot file is present in the XCC FFDC at pstorage/hiddenlog/hiddenlog_YYYY-MM-DD_HH:MM:SS.tgz, with a filename timestamp inside the OS silent window and shortly before the FQXSPPP4042I event.
The operating system goes silent BEFORE the BMC's own Power-On-Reset, not after.
This signature indicates a hardware-level event on the chassis that affects both the host CPU/IO complex and the management controller (BMC) standby power rail.
The FQXSPPP4042I event with reason "Power-On-Reset" means the BMC chip itself experienced a hardware-level reset because power was applied to it. This is distinct from a software reset, a watchdog reset, or a user-initiated reset, all of which would carry a different reason code in the same FQXSPPP4042I message. "Power-On-Reset" implies the BMC standby power rail dropped and came back, or the BMC was held in reset and released by a hardware mechanism.
The FQXSPPP4029I event immediately after says the chassis was automatically powered on because the power restore policy is set to "restore previous power state". After the BMC came up from its Power-On-Reset, it observed that the chassis was not in the running state it expected, and it applied the configured restore policy to power the chassis back on. This is normal BMC behavior; what is abnormal is that it had to do it.
The absence of any Power Unit Power-off/down event pair in the IPMI SEL rules out a normal chassis power cycle. The absence of Power Supply AC-lost events rules out a utility power loss to the chassis. The absence of NMI, MCE, or watchdog SEL entries rules out the hardware fault categories that the BMC can normally classify on its own.
The hidden log snapshot in the FFDC at pstorage/hiddenlog/ is created when the BMC fires an internal forensic-trigger condition. The snapshot file's name itself is the trigger time. A snapshot timestamp inside the OS silent window and just before the BMC's own Power-On-Reset indicates that a BMC-internal trigger condition fired in the seconds before the BMC reset.
The OS going silent BEFORE the BMC reset (rather than after it) means the OS-side outage is not a consequence of the BMC reset. Both events going down close together, with the BMC firing a forensic snapshot in between, is consistent with a shared underlying hardware fault rather than a software bug. The fault is typically on the planar (motherboard), the power distribution board, or the BMC standby power circuitry.
The exact mechanism inside the BMC and the planar cannot be decoded without Lenovo-internal tooling. The BMC component activity log and the hidden log trigger conditions are proprietary to Lenovo.
This is a hardware-related issue on the Lenovo platform. There is no ESXi or vCenter software action that addresses it. Engage Lenovo support and provide the XCC FFDC bundle.
Collect a Lenovo XCC FFDC bundle from the affected host through the XCC web UI: Service → Service Data → Download Now.
On the ESXi host, verify the modification time of the pre-allocated kernel dump file and capture the output for reference:
esxcli system coredump file list ls -la /vmfs/volumes/<datastore-uuid>/vmkdump/
A modification time older than the reboot event confirms no PSOD was written.
Inside the FFDC archive, review the IPMI SEL at tmp/ffdc_live_dbg/sel.log and confirm:
Power Unit ... Power off/down Asserted and Power Unit ... Power off/down Deasserted pair exists for the event date.Power Supply ... AC lost or ... AC out-of-range entries exist for the event date.NMI, MCE, or Watchdog entries exist for the event date.Decode the XCC audit log at var/volatile/log/cem_auditlog. The audit log uses Unix-epoch timestamps in hexadecimal. Convert each hex timestamp to UTC and look for the following message IDs around the event:
FQXSPPP4042I with reason "Power-On-Reset"FQXSPPP4029I referencing the power restore policyFQXSPPP4035I and FQXSPPP4023I entries for the event window confirms no user-initiated or automation-initiated chassis power command was issued.List the hidden log snapshot files in the FFDC at pstorage/hiddenlog/. The filename of each snapshot is the trigger time in UTC, in YYYY-MM-DD_HH:MM:SS format. A snapshot timestamp inside the OS silent window indicates a BMC forensic trigger fired during the event. The contents of these .tgz files are not decodable outside Lenovo.
Open a hardware support case with Lenovo for the affected server. Provide the same XCC FFDC bundle and ask Lenovo to specifically explain:
FQXSPPP4042I (Management Controller reset due to Power-On-Reset) was logged with no corresponding chassis power cycle in the SEL.From the ESXi side, no software-level action is required for this event. The operating system was not given an opportunity to react and produced no diagnostic artifact attributable to a software fault. Once the Lenovo investigation is in motion, the case may be moved to a monitoring state.