VMware vSAN cluster experiences virtual machine migration (vMotion) failures. Investigation of the source ESXi host reveals persistent LSOM checksum errors in the vmkwarning.log matching the following pattern:
vmkwarning: cpu13:2098860)WARNING: LSOM: LSOMScrubReadComplete:2983: Throttled: Checksum error on comp <UUID>, offset XXX (computed CRC 0x9bf89627 != saved CRC 0x5bf554a1 (NF)) 4096
Notably, this condition presents with the following anomalous characteristics:
There are no SCSI Sense Codes indicating physical disk failure (e.g., 0X3 / Medium Error) present in the logs.
There are no Permanent Device Loss (PDL).
There is no data indicating that newly issued I/O is being impacted "in-flight" and committed to the disk group in a damaged state.
VMware vSAN 8.x
The checksum errors are caused by physical signal degradation or hardware faults in the storage I/O path, specifically within the storage backplane or SAS cables. Because the individual storage media devices are fundamentally healthy, the storage controller does not log standard SCSI medium errors. However, the data encounters interference as it traverses the faulty backplane or cable, resulting in a CRC mismatch when validated by the vSAN LSOM during read operations.
To resolve this issue, the physical hardware fault in the storage path must be remediated.
Engage the hardware vendor to perform diagnostics on the chassis and replace the storage backplane and SAS cables associated with the affected ESXi host.
Place the affected ESXi host into Maintenance Mode. Ensure vSAN data is protected using "Ensure Accessibility" or "Full Data Migration" depending on cluster capacity
Navigate to the vSphere Client: vSAN Cluster > Configure > Disk Management.
Remove the disk group(s) reporting the checksum errors on the repaired host.
Recreate the disk group(s). This action initializes the drives and clears the residual checksum error counters associated with the previous hardware fault.
Exit Maintenance Mode and validate that vMotion operations succeed without generating new LSOMScrubReadComplete warnings.
Before performing disk group removal, verify the cluster has sufficient capacity and health to tolerate the temporary reduction in storage resources. Verify overall cluster health using the vSAN Skyline Health checks.
Managing and Configuring a vSAN disk group using esxcli commands