Linux Guest OS write IOs to vNVMe vdisk may fail with error '0x82' (NVME_SC_NAMESPACE_NOT_READY)
search cancel

Linux Guest OS write IOs to vNVMe vdisk may fail with error '0x82' (NVME_SC_NAMESPACE_NOT_READY)

book

Article ID: 313844

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • This issue may be noticed only with Linux distributions having vNVMe vdisk.
  • The problem was first noticed after vNVMe retries were disabled in vSphere 7.0 U3e
  • While performing either of the following operations on a VM, write IOs can fail with an IO error '0x82' that is NVME_SC_NAMESPACE_NOT_READY

  Operations:
  - XvMotion
  - SvMotion
  - Snapshot consolidation
  - Online disk promote

  • You may see the below entries in vmware.log in the virtual machine working directory. To determine the working directory, right-click the virtual machine and click Edit Settings, then click Options > Virtual Machine Working Location

2022-08-25T13:41:30.794Z In(05) vcpu-0 - NVME-VMK: nvme0:32: WRITE Command failed. Status: 0x0/0x82.
2022-08-25T13:41:30.822Z In(05) vmx - SnapshotESXCombineProgressTotal: Snapshot consolidation progress: 51
2022-08-25T13:41:30.883Z In(05) vcpu-0 - NVME-VMK: nvme0:32: WRITE Command failed. Status: 0x0/0x82.
2022-08-25T13:41:30.922Z In(05) vmx - SnapshotESXCombineProgressTotal: Snapshot consolidation progress: 51
2022-08-25T13:41:30.930Z In(05) vcpu-0 - NVME-VMK: nvme0:32: WRITE Command failed. Status: 0x0/0x82.
2022-08-25T13:41:30.971Z In(05) vcpu-0 - NVME-VMK: nvme0:32: WRITE Command failed. Status: 0x0/0x82.
2022-08-25T13:41:31.023Z In(05) vmx - SnapshotESXCombineProgressTotal: Snapshot consolidation progress: 51
2022-08-25T13:41:31.110Z In(05) vcpu-0 - NVME-VMK: nvme0:32: WRITE Command failed. Status: 0x0/0x82.
2022-08-25T13:41:31.123Z In(05) vmx - SnapshotESXCombineProgressTotal: Snapshot consolidation progress: 51

  • Output of 'dmesg -c' show write IO errors seen by Linux Guest OS

[  848.054583] blk_update_request: critical target error, dev nvme0n1, sector 128960 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[  855.309500] blk_update_request: critical target error, dev nvme0n1, sector 43552 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[  855.347411] blk_update_request: critical target error, dev nvme0n1, sector 43488 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0


Environment

VMware vSphere ESXi 7.0.3
VMware vSphere ESXi 8.0

Cause

For a VM with Linux GOS having nvme vdisks, IOs might fail for the region that is actively being copied. This is because IOs to the region being copied are blocked for the duration of copy. This is also escalated since the change in vSphere 7.0U3e disabled retries for vNVMe.

Resolution

Currently there is no resolution.

Workaround:

On the Linux Guest OS, the NVMe retry count should be increased from default value of '5' to a higher value, around '30' or more. This is to make sure that the GOS nvme retries do not get exhausted while the specific region is locked by disk copy code.

Steps to change the NVMe retry count:

SSH to Guest OS and run following command
# echo 30 > /sys/module/nvme_core/parameters/max_retries