1. A NVMe disk failure is detected on a vSAN Node, although the affected disk group/pool remains healthy
2. You may experience a spike in write latency in the environment (<=50 ms)
3. Your server hardware management console like iDRAC, iLO, Lenovo XClarity Controller (XCC) & Cisco IMC (Integrated Management Controller), etc is reporting a drive failure.
4. The vSAN Disk Management in the vSphere Client doesn't display any unhealthy Disks. All Disks are healthy:
5. You may or may not see a reported disk issue in vSAN Health (Physical Disk Health - Operation Health) , however the disk remains mounted as seen in Disk Management.
vSAN OSA & ESA ( All Versions)
NVMe disks
vobd.log:
2026-03-03T20:23:24.602Z In(182) vmkernel: cpu113:4737276)LSOMCommon: LSOMGetSmartData:1478: Getsmart support failed on disk t10.NVMe____Dell_NVMe_ISE_PS1030_MU_U.2_6.4TB_______##############:2
2026-03-03T20:23:25.163Z In(182) vmkernel: cpu68:4737276)LSOMCommon: LSOMGetSmartData:1478: Getsmart support failed on disk t10.NVMe____Dell_NVMe_ISE_PS1030_MU_U.2_6.4TB_______###############:2
2026-02-26T21:00:10.322Z In(14) vobd[2098147]: The event ([esx.problem.vob.vsan.lsom.backupfailednvmediskhealthcriticalwarning] NVMe critical health warning for disk t10.NVMe____Dell_NVMe_ISE_PS1030_MU_U.2_6.4TB_______################ is: The disk's backup device has failed.) was sent immediately to hostd;
2026-02-26T21:10:10.932Z In(14) vobd[2098147]: [vSANCorrelator] 170637431890us: [esx.problem.vob.vsan.lsom.backupfailednvmediskhealthcriticalwarning] NVMe critical health warning for disk t10.NVMe____Dell_NVMe_ISE_PS1030_MU_U.2_6.4TB_______################ is: The disk's backup device has failed.
VMkernel.log:
2026-02-27T00:28:57.541Z In(14) vobd[2098147]: [vSANCorrelator] 115377453105us: [esx.problem.vob.vsan.lsom.backupfailednvmediskhealthcriticalwarning] NVMe critical health warning for disk t10.NVMe____Dell_NVMe_ISE_PS1030_MU_U.2_6.4TB_______############# is: The disk's backup device has failed.
2026-02-27T00:28:57.541Z In(14) vobd[2098147]: The event ([esx.problem.vob.vsan.lsom.backupfailednvmediskhealthcriticalwarning] NVMe critical health warning for disk t10.NVMe____Dell_NVMe_ISE_PS1030_MU_U.2_6.4TB_______############### is: The disk's backup device has failed.) was sent immediately to hostd
2026-02-26T21:32:23.141Z In(182) vmkernel: cpu120:2099860)LSOM: LSOMLogDiskEvent:8418: Disk Event decommission for MD 52d2d87e-99ec-449f-89d2-############ (t10.NVMe____Dell_NVMe_ISE_PS1030_MU_U.2_6.4TB_______#############:2)
vsandevicemonitord.log:
2026-02-26T20:00:06Z In(14) vsandevicemonitord[2100825]: [70509974144]: WARNING - NVMe critical health warning for disk t10.NVMe____Dell_NVMe_ISE_PS1030_MU_U.2_6.4TB_______############# is: 'The disk's backup device has failed'.
2026-02-26T20:10:07Z In(14) vsandevicemonitord[2100825]: [70509974144]: WARNING - NVMe critical health warning for disk t10.NVMe____Dell_NVMe_ISE_PS1030_MU_U.2_6.4TB_______############# is: 'The disk's backup device has failed'.
[root@ESXi:~] esxcli nvme device log smart get -A vmhba1SMART And Health Info: Available Spare Space Below Threshold: false Temperature Warning: false NVM Subsystem Reliability Degradation: false Read Only Mode: false Volatile Memory Backup Device Failure: true Composite Temperature: 306 K Available Spare: 100 % Available Spare Threshold: 10 % Percentage Used: 0 % Data Units Read: 0x60ea0528 Data Units Written: 0x2f8fbda9 Host Read Commands: 0x27f7a927fb Host Write Commands: 0x12f8084edf Controller Busy Time: 0x13522 Power Cycles: 0x1a Power On Hours: 0x3f91 Unsafe Shutdowns: 0x9 Media Errors: 0x0 Number of Error Info Log Entries: 0x2c Warning Composite Temperature Time: 0 Mins Critical Composite Temperature Time: 0 Mins Temperature Sensor 1: 319 K Temperature Sensor 2: 309 K Temperature Sensor 3: 0 K Temperature Sensor 4: 0 K Temperature Sensor 5: 0 K Temperature Sensor 6: 0 K Temperature Sensor 7: 0 K Temperature Sensor 8: 0 K
[root@ESXi:~] esxcli storage core device smart get -d t10.NVMe____Dell_NVMe_ISE_PS1030_MU_U.2_6.4TB_______#####################Parameter Value Threshold Worst Raw------------------------ ------- --------- ----- ---Health Status WARNING N/A N/A N/APower-on Hours 16273 N/A N/A N/APower Cycle Count 26 N/A N/A N/AReallocated Sector Count 0 90 N/A N/ADrive Temperature 33 75 N/A N/A
A "Volatile Memory Backup Failed" error (often NVMe SMART critical warning 0x10 or bit 4) indicates that the capacitor or battery designed to save cached data from RAM to NAND during a power loss has failed or is degraded. This risks data loss upon power failure; immediate actions include backing up data and replacing the SSD.
Failed Capacitor/Battery: The power-loss protection (PLP) capacitor on the SSD is faulty. The drive often requires replacement.
SMART Warning Trigger: The SSD’s internal health check detects this failure, often as part of a critical warning (0x10).
Volatile Memory Backup Device Failure: true is not considered as a device failure by vSAN OSA/ESA.
While the 'Volatile Memory Backup' SMART attribute is a widely adopted metric among storage vendors, it currently lacks a unified industry standard defining it as an explicit indicator of drive failure. Without a formal consensus, categorizing this attribute as a critical fault may lead to conflicting interpretations across different hardware manufacturers. Thus, the Dying Disk Handling (DDH) feature in vSAN currently doesn't mark this SMART error for remediation
vSAN Health monitoring remains conservative in its displacement logic; currently, only the 'Subsystem Reliability Degraded' status is strictly classified as a functional failure and formally reported to the Health service as a trigger for replacement or evacuation."
Engage the hardware vendor for further investigation and possible disk replacement.
See KB Enabling vSAN alerts for NVMe SMART data in vCenter to be alerted in vCenter for potential future occurrences.
Critical Warning (CWARN): This field indicates critical warnings for the Controller.
The value of this field shall indicate the value of the Critical Warning field in the Controller’s SMART / Health Information log page.
Volatile Memory Backup Failed (VMBF): This bit shall indicate the same value as the Volatile Memory Backup Failed (VMBF) bit (i.e., bit 4) in the Critical Warning field in the Controller’s SMART / Health Information log page.