In a vSAN Express Storage Architecture (ESA) environment, vCenter may trigger a physical disk "Operation" alarm. This is typically accompanied by a fluctuating Skyline Health score as the system attempts and fails to repair transient errors on an NVMe device.
Symptoms
vdq -Hi on the host returns an I/O timeout error: VsanUtil::AIO_ReadWriteDeviceWithTimeOut: Device: /vmfs/devices/disks/[ID], read 0 out of 4096 errno 2
Underlying hardware degradation of an NVMe device results in unrecoverable metadata read timeouts. Because ESA utilizes a single-tier storage pool, persistent I/O failures on one device trigger the driver to offline the controller to prevent cluster-wide storage stalls.
localcli storage core device physical get -d [Device_ID]localcli storage core device smart get -d [Device_ID]vdq error output.