Slow ESXi Log Bundle Generation Due to RDM Disk Scan Failures
search cancel

Slow ESXi Log Bundle Generation Due to RDM Disk Scan Failures

book

Article ID: 405356

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSAN

Issue/Introduction

The vm-support --loglevel 0 command takes an excessively long time (approximately two hours) to generate a diagnostic log bundle on an ESXi host. 

 

/var/log/vmsupport.log indicates the following timed out 

        ERROR: naa.###############: failed to scan disk partitions

        Traceback (most recent call last):

          File "/usr/lib/vmware/vm-support/bin/systemStorageDebug.py", line 502, in scanPartitions

          File "/lib64/python3.11/site-packages/systemStorage/esxdisk.py", line 304, in scanPartitions

          File "/lib64/python3.11/site-packages/systemStorage/esxdisk.py", line 151, in scan

          File "/lib64/python3.11/site-packages/systemStorage/mbr.py", line 323, in scan

          File "/lib64/python3.11/site-packages/systemStorage/blockdev.py", line 140, in readBlock

        TimeoutError: [Errno 110] Connection timed out

 

/var/run/log/vmkernel logs for naa.############### shows D:0x18 i.e. Reservation Conflicts

vmkernel: cpu9:2098137)NMP: nmp_ResetDeviceLogThrottling:3845: Error status H:0x0 D:0x18 P:0x0 Sense Data: 0x0 0x0 0x0 from dev "naa.###############" occurred 1999 times(of 1999 commands)

vmkernel: cpu16:2098137)NMP: nmp_ResetDeviceLogThrottling:3845: Error status H:0x0 D:0x18 P:0x0 Sense Data: 0x0 0x0 0x0 from dev "naa.##################" occurred 1802 times(of 1802 commands)

 

esxcli storage core devie list and verify Lun Configuration: 

naa.############:
   Display Name: IBM Fibre Channel Disk (naa.##############)
   Has Settable Display Name: true
   Size: 512000
   Device Type: Direct-Access
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/naa.#############
   Vendor: IBM
   Model: 2107900
   Revision: .126
   Is Perennially Reserved: false
  

Environment

VMware vSphere Esxi 7.x

VMware vSphere Esxi 8.x 

VMware vSAN 7.x

VMware vSAN 8.x

Cause

The delay is caused by the /usr/lib/vmware/vm-support/bin/systemStorageDebug.pyc Python script, which fails to scan Raw Device Mapping (RDM) disks configured for virtual machines, while VMDKs are stored on a vSAN datastore. The RDM disks are not marked as perennially reserved, leading to SCSI reservation conflicts. These conflicts cause the script to timeout while probing the affected Logical Unit Numbers (LUNs), significantly slowing down the log bundle generation process.

Resolution

To resolve the issue, configure the affected RDM disks as perennially reserved to prevent SCSI reservation conflicts. Follow these steps:

Option 1: Using the vSphere Client

  1. Log in to the vSphere Client and navigate to the affected ESXi host.
  2. Click the Configure
  3. Under Storage, select Storage Devices.
  4. Locate the RDM disk in the list (identified by its NAA ID, e.g., naa.<unique-id>).
  5. Select the disk and click Mark as Perennially Reserved.
  6. Repeat for each RDM disk involved in the configuration.
  7. Re-run the vm-support --loglevel 0 command to verify that the log bundle generation completes within an acceptable timeframe.

Option 2: Using the ESXi Command Line

  1. Connect to the ESXi host via SSH (ensure SSH is enabled in the vSphere Client).
  2. Mark each affected RDM disk as perennially reserved using the following command, replacing <naa.id> with the disk’s NAA identifier (e.g., naa.##################):

esxcli storage core device setconfig -d <naa.id> --perennially-reserved=true

  1. Verify that the disk is marked as perennially reserved:

esxcli storage core device list -d <naa.id> | grep Perennially

Expected Output: Is Perennially Reserved: true

  1. Repeat for all affected RDM disks.
  2. Re-run the vm-support --loglevel 0 command to confirm resolution.