VMFS extent Offline, causing VMs and Hostd go unresponsive
search cancel

VMFS extent Offline, causing VMs and Hostd go unresponsive

book

Article ID: 323044

calendar_today

Updated On: 03-28-2025

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Symptoms: 

  • I/O to virtual disk (VMDK) files that reside on the offline device might slow down or fail. This might cause virtual machines that have VMDKs residing on that device to become unresponsive or even fail.

  • Hostd may go unresponsive.

  • If the VMFS datastore capacticy was expanded by adding an extent and the extent is offline, VMFS expansion will fail. 
  • Storage adapter rescan fails with the error "An error occurred while communicating with remote host".

  • In the var/run/log/vmkernel.log file, similar entries are seen:

YYYY-MM-DDTHH:MM.SSSZ Wa(180) vmkwarning: cpu126:2100497 opID=55dbbda8)WARNING: LVM: 17711: An attached device went offline. naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:1 file system [VOLUME-NAME, VOLUME-UUID]
YYYY-MM-DDTHH:MM.SSSZ Wa(180) vmkwarning: cpu100:2100512 opID=b09639a4)WARNING: LVM: 17711: An attached device went offline. naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:1 file system [VOLUME-NAME, VOLUME-UUID]
YYYY-MM-DDTHH:MM.SSSZ Wa(180) vmkwarning: cpu88:2100491 opID=26701d75)WARNING: LVM: 17711: An attached device went offline. naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:1 file system [VOLUME-NAME, VOLUME-UUID]
YYYY-MM-DDTHH:MM.SSSZ Wa(180) vmkwarning: cpu88:2100491 opID=26701d75)WARNING: LVM: 17711: An attached device went offline. naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:1 file system [VOLUME-NAME, VOLUME-UUID]
YYYY-MM-DDTHH:MM.SSSZ Wa(180) vmkwarning: cpu84:2100509 opID=c2db1724)WARNING: LVM: 17711: An attached device went offline. naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:1 file system [VOLUME-NAME, VOLUME-UUID]

  • In the var/run/log/vobd.log file, similar entries are seen:

YYYY-MM-DDTHH:MM.SSSZ In(14) vobd[2098027]:  [vmfsCorrelator] 13900876943838us: [vob.vmfs.extent.offline] An attached device went offline. naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:1 file system [VOLUME-NAME, VOLUME-UUID]
YYYY-MM-DDTHH:MM.SSSZ In(14) vobd[2098027]:  [vmfsCorrelator] 13900800936747us: [esx.problem.vmfs.extent.offline] An attached device naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:1 may be offline. The file system [VOLUME-NAME, VOLUME-UUID] is now in a degraded state. While the datastore is still available, parts of data that reside on the extent that went offline might be inaccessible.
YYYY-MM-DDTHH:MM.SSSZ In(14) vobd[2098027]:  [vmfsCorrelator] 13900887878275us: [vob.vmfs.extent.offline] An attached device went offline. naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:1 file system [VOLUME-NAME, VOLUME-UUID]
YYYY-MM-DDTHH:MM.SSSZ In(14) vobd[2098027]:  [vmfsCorrelator] 13900811870939us: [esx.problem.vmfs.extent.offline] An attached device naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx may be offline. The file system [VOLUME-NAME, VOLUME-UUID] is now in a degraded state. While the datastore is still available, parts of data that reside on the extent that went offline might be inaccessible.
YYYY-MM-DDTHH:MM.SSSZ In(14) vobd[2098027]:  [vmfsCorrelator] 13900966570083us: [esx.problem.vmfs.extent.offline] An attached device naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:1 may be offline. The file system [VOLUME-NAME, VOLUME-UUID] is now in a degraded state. While the datastore is still available, parts of data that reside on the extent that went offline might be inaccessible.

Validation step 1:

  • Run the following command to determine which devices back the VMFS volume and check if any devices are offline:

    vmkfstools -Ph /vmfs/volumes/<datastore>

Example
[root@host1:~] vmkfstools -Ph /vmfs/volumes/<datastore>
VMFS-6.82 (Raw Major Version: 24) file system spanning 1 partitions.
File system label (if any): datastore
Mode: public
Capacity 399.8 GB, 156.2 GB available, file block size 1 MB, max supported file size 64 TB
Disk Block Size: 512/16384/0
UUID: ########-########-####-############
Partitions spanned (on "lvm"):
        naa.################################:1
        naa.################################:1
        naa.################################:1
        naa.################################:1
Is Native Snapshot Capable: NO
 
Note: In this scenario, the datastore is configured with multiple extents. As a result, multiple naa.id entries appear under "Partitions spanned (on 'lvm')," indicating that multiple devices are backing this datastore.
 

Validation Steps 2.

  • Verify if this is a local datastore or SAN device.

    • Run the following command and identify using device naaid

esxcli storage core device list

naa.xxxxxxxxxxxxxxxx:
   Display Name: Local Make Disk (naa.xxxxxxxxxxxxxxxxxx)
   Has Settable Display Name: true
   Size: 3662830
   Device Type: Direct-Access
   Multipath Plugin: HPP
   Devfs Path: /vmfs/devices/disks/naa.xxxxxxxxxxxxxxxx
   Vendor: "Vendor Name"
   Model: XXX5XVUG3T84
   Revision: B70C
   SCSI Level: 6
   Is Pseudo: false
   Status: on
   Is RDM Capable: true
   Is Local: true

  • Check for "Is Local:" from the above output. if the value states true, then this is a local disk, if false then this is SAN LUN.
 

Environment

VMware vSphere ESXi 8.x
VMware vSphere ESXi 7.x
VMware vSphere ESXi 6.x

Cause

  • The esx.problem.vmfs.extent.offline message is received when an ESXi host loses connection to a storage device that backs an entire VMFS datastore or any of its extents.

  • This loss of connection can happen when a switch or cable that connects the device to the ESXi host is disconnected or when the device is reformatted to be used by another volume.

Resolution

  • Identify the devices that are affected and restore connectivity. If the device has been reformatted and reassigned to another volume, the corresponding portion of the original volume will be permanently lost and cannot be recovered.

  • If multi extent is configured over local datastore, the run the following command to identify the physical location of the disk.
     

    esxcli storage core device physical get -d  <device Naaid>

    [root@host1:~] esxcli storage core device physical get -d naa.xxxxxxxxxxxxxxx
    Physical Location: enclosure 12 slot 4

  • Involve storage vendor to identify and fix the underlying storage / disk issue.

  • If the issue continues after resolving the underlying storage problem, restart the host.