Operations:
- XvMotion
- SvMotion
- Snapshot consolidation
- Online disk promote
2022-08-25T13:41:30.794Z In(05) vcpu-0 - NVME-VMK: nvme0:32: WRITE Command failed. Status: 0x0/0x82.
2022-08-25T13:41:30.822Z In(05) vmx - SnapshotESXCombineProgressTotal: Snapshot consolidation progress: 51
2022-08-25T13:41:30.883Z In(05) vcpu-0 - NVME-VMK: nvme0:32: WRITE Command failed. Status: 0x0/0x82.
2022-08-25T13:41:30.922Z In(05) vmx - SnapshotESXCombineProgressTotal: Snapshot consolidation progress: 51
2022-08-25T13:41:30.930Z In(05) vcpu-0 - NVME-VMK: nvme0:32: WRITE Command failed. Status: 0x0/0x82.
2022-08-25T13:41:30.971Z In(05) vcpu-0 - NVME-VMK: nvme0:32: WRITE Command failed. Status: 0x0/0x82.
2022-08-25T13:41:31.023Z In(05) vmx - SnapshotESXCombineProgressTotal: Snapshot consolidation progress: 51
2022-08-25T13:41:31.110Z In(05) vcpu-0 - NVME-VMK: nvme0:32: WRITE Command failed. Status: 0x0/0x82.
2022-08-25T13:41:31.123Z In(05) vmx - SnapshotESXCombineProgressTotal: Snapshot consolidation progress: 51
[ 848.054583] blk_update_request: critical target error, dev nvme0n1, sector 128960 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 855.309500] blk_update_request: critical target error, dev nvme0n1, sector 43552 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[ 855.347411] blk_update_request: critical target error, dev nvme0n1, sector 43488 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
For a VM with Linux GOS having nvme vdisks, IOs might fail for the region that is actively being copied. This is because IOs to the region being copied are blocked for the duration of copy. This is also escalated since the change in vSphere 7.0U3e disabled retries for vNVMe.
On the Linux Guest OS, the NVMe retry count should be increased from default value of '5' to a higher value, around '30' or more. This is to make sure that the GOS nvme retries do not get exhausted while the specific region is locked by disk copy code.
Steps to change the NVMe retry count:
SSH to Guest OS and run following command
# echo 30 > /sys/module/nvme_core/parameters/max_retries