Skyline health check on vCenter shows one or more failed disks.
e.g. Error: vSAN physical disk alarm 'Operation'
However, the hardware monitoring tool, e.g. iDRAC, iLO, or similar... shows the disk as healthy.
vSAN 8.x/9.x
The disk may have failed, which has not been picked up by the hardware yet, or it may be not be a hardware problem.
Check the logs on the ESXi host for error messages relating to the disk, or disks, triggering the alerts.
Some examples that indicate a potential hardware failure:
/var/run/log/vmkernel.log:
In(182) vmkernel: cpu56:2098250)ScsiDeviceIO: 4686: Cmd(0x45bcd3e1c400) 0x28, CmdSN 0x41bc from world 0 to dev "naa.XXXXXXXXXXXXXXXX" failed H:0x7 D:0x0 P:0x0
/var/run/log/hostd.log:
In(166) Hostd[2107064]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 108173 : Device naa.XXXXXXXXXXXXXXXX has been removed or is permanently inaccessible. Affected datastores (if any): Unknown.
/var/run/log/vmkwarning.log:
Wa(180) vmkwarning: cpu10:2097936)WARNING: HPP: HppDeviceUpdateState:5242: Device 'naa.XXXXXXXXXXXXXXXX' is changing to 'permanent device loss' from 'on'.
If any messages similar to the above are seen, schedule a cold reboot of the ESXi host.
1. Put the ESXi host into Maintenance Mode with the Ensure Accessibility option before reboot.
2. When the ESXi host is in Maintenance Mode, power it down completely.
3. Wait for five minutes.
4. Power the ESXi host back on.
The cold reboot will force the physical disks to be reinitialized. If there is a hardware fail on a disk, this will be picked up here, and the hardware management tool will flag the disk(s) as failed.
- Replacement of the failed disk(s) will then need to be scheduled.
If the disks all initialize successfully, check the Skyline health view after the ESXi host is back up to confirm the disk is no longer flagging as a failure.