Multiple virtual machines power off unexpectedly during storage activity involving the removal and re-adding of datastores.
search cancel

Multiple virtual machines power off unexpectedly during storage activity involving the removal and re-adding of datastores.

book

Article ID: 439812

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • Virtual machines power off unexpectedly during storage maintenance activities. This typically occurs when datastores are removed and subsequently re-added to the ESXi host.

Environment

VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Cause

The virtual machines powered off because the underlying LUNs removed manually from the storage array from the ESXi host while the VMs were still running. This results in a Permanent Device Loss (PDL) state, causing the VMs to power off.

Cause Validation:

  • The /var/run/log/fdm.log file on ESXi host confirms that the virtual machine powered off due to a storage failure.
     
    YYYY-MM-DDTHH:MM.SSSZ In(166) Fdm[3712228]: -->      EventEx=com.vmware.vc.HA.VmcpStorageFailureDetectedForVm vm=/vmfs/volumes/########-########-####-############/###########/###########.vmx host=host-### tag=host-###:1985968486:##
    YYYY-MM-DDTHH:MM.SSSZ In(166) Fdm[3712228]: -->      EventEx=com.vmware.vc.HA.VmcpTerminatingVm vm=/vmfs/volumes/########-########-####-############/###########//###########/.vmx host=host-### tag=host-###:-1595930517:##

  • The /var/run/log/hostd.log file on ESXi host also confirms that the virtual machine powered off since the underlying datastore went inaccessible.

    YYYY-MM-DDTHH:MM.SSSZ Db(167) Hostd[2099030]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-############/###########/###########.vmx] Handling vmx message 83076343: The storage backing for virtual disk '###########_3.vmdk' has been permanently lost. You may be able to hot remove this virtual device from the virtual machine and continue after clicking Retry. Click Cancel to terminate this session.
    YYYY-MM-DDTHH:MM.SSSZ In(166) Hostd[2099030]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 178417 : Message on ########### on ############# in ha-datacenter: The storage backing for virtual disk '###########_3.vmdk' has been permanently lost. You may be able to hot remove this virtual device from the virtual machine and continue after clicking Retry. Click Cancel to terminate this session.
    YYYY-MM-DDTHH:MM.SSSZ Wa(164) Hostd[2099006]: [Originator@6876 sub=Hostsvc.VmkVprobSource] Datastore array is marked as event object. Check 'extension.xml' for event: 'esx.problem.storage.connectivity.lost'
    YYYY-MM-DDTHH:MM.SSSZ Db(167) Hostd[2099012]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-############/###########/###########.vmx] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x000000be2baae860]DatastoreSystemMsg::DsChange{Type=UPDATE-NOW-DISCONNECTED, Msg=DatastoreAvailableMsg{DatastoreMsg{Type=1, MoId=########-########-####-############, Path=/vmfs/volumes/########-########-####-############}}}; DatastoreSystemMsg::DsChange{Type=UPDATE_CAPACITY, Msg=DatastoreResizedMsg{DatastoreMsg{Type=1, MoId=########-########-####-############, Path=/vmfs/volumes/########-########-####-############}, NewCap=0, OldCap=5712038068224}};
    YYYY-MM-DDTHH:MM.SSSZ Wa(164) Hostd[2099012]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-############/###########/###########.vmx] UpdateStorageAccessibilityStatusInt: The datastore ########-########-####-############ is not accessible
    YYYY-MM-DDTHH:MM.SSSZ In(166) Hostd[2099012]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-############/###########/###########.vmx] UpdateStorageAccessibilityStatusInt: Vm's storage accessibility status changed to false
    YYYY-MM-DDTHH:MM.SSSZ In(166) Hostd[2099012]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-############/###########/###########.vmx] VM config backing gone - try to mark VM invalid.
    YYYY-MM-DDTHH:MM.SSSZ In(166) Hostd[2099009]: [Originator@6876 sub=Libs] DictionaryLoad: Cannot open file "/vmfs/volumes/########-########-####-############/###########/###########.vmx": Input/output error.
    YYYY-MM-DDTHH:MM.SSSZ Db(167) Hostd[2099008]: [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/########-########-####-############/###########/###########.vmx] VM went offline; security domain #### will be cleaned up

  • The /var/run/log/vobd.log file on the ESXi host confirms that the storage device (LUN) was either removed or became permanently inaccessible.

    YYYY-MM-DDTHH:MM.SSSZ In(14) vobd[2097763]:  [scsiCorrelator] 8368722795683us: [esx.problem.scsi.device.state.permanentloss] Device: naa.################################ has been removed or is permanently inaccessible. Affected datastores (if any): "##########".

  • The /var/run/log/hostd.log file on the ESXi host also confirms that the storage device (LUN) was either removed or became permanently inaccessible.

    YYYY-MM-DDTHH:MM.SSSZ In(166) Hostd[2098999]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 178422 : Device naa.################################ has been removed or is permanently inaccessible. Affected datastores (if any): "##########".

  • The /var/run/log/vmkernel.log file confirms that the datastore was either removed or became permanently inaccessible.

    YYYY-MM-DDTHH:MM.SSSZ Wa(180) vmkwarning: cpu3:5685260)WARNING: ScsiDevice: 1794: Device :naa.################################ has been removed or is permanently inaccessible.

Resolution

  • Present the underlying LUNs back to ESXi host. 
  • Reboot the ESXi host.

Additional Information

The termination of VMs when their underlying storage is removed is expected behavior in a vSphere environment. To avoid this during maintenance activities, use one of the following methods:

  • Scheduled Downtime: Power off all resident VMs gracefully before the storage maintenance begins.

    or

  • Storage vMotion: Migrate VMs to a different datastore that is not part of the maintenance activity.