Virtual Machine crashes and powers Off automatically following an ESXi Host All Paths Down (APD) Event
search cancel

Virtual Machine crashes and powers Off automatically following an ESXi Host All Paths Down (APD) Event

book

Article ID: 439080

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • Virtual machine crashes with a vmx-zdump.
  • VM gets powered Off and then powered ON on a different ESXi host by DRS.
  • The datastore where the VM folder resides reports that it entered all paths down state during the same time. 

Environment

vSphere 8.x

Cause

ESXi host experienced an All Paths Down (APD) condition for datastore where the VM files reside. As a direct result of the APD event, the host lost communication with the virtual machine's active swap file (.vswp) located on the affected datastore.

var/run/log/vobd.log

In(14) vobd[2097956]  [vmfsCorrelator] 687613732375us: [vob.vmfs.heartbeat.timedout] ########-af1####-8980-####### <DatastoreName>
In(14) vobd[2097956]  [vmfsCorrelator] 687627694689us: [esx.problem.vmfs.heartbeat.timedout] ########-af1####-8980-####### <DatastoreName>
In(14) vobd[2097956]  The event ([esx.problem.vmfs.heartbeat.timedout] ########-af1####-8980-####### <DatastoreName>) was sent immediately to hostd;
In(14) vobd[2097956]  [APDCorrelator] 687772833573us: [vob.storage.apd.timeout] Device or filesystem with identifier [naa.id#######] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
In(14) vobd[2097956]  [APDCorrelator] 687786799045us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [naa.id#######] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
In(14) vobd[2097956]  The event ([esx.problem.storage.apd.timeout] Device or filesystem with identifier [naa.id#######] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.) was sent immediately to hostd;
..
..
In(14) vobd[2097956]  [VMCorrelator] 688029338869us: [vob.vm.kill.unexpected.fault.failure] The virtual machine using the configuration file /vmfs/volumes/########-af1####-8980-#######/VMName/VMName.vmx could not fault in a page from the swap file at /vmfs/volumes/########-af1####-8980-#######/VMName/VMName-05319e9a.vswp. The virtual machine has been powered off.
In(14) vobd[2097956]  [UserWorldCorrelator] 688030241008us: [vob.uw.core.dumped] /bin/vmx(2114902) /var/core/vmx-zdump.000
In(14) vobd[2097956]  [UserWorldCorrelator] 688044211356us: [esx.problem.application.core.dumped] An application (/bin/vmx) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /var/core/vmx-zdump.000.
In(14) vobd[2097956]  The event ([esx.problem.application.core.dumped] An application (/bin/vmx) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /var/core/vmx-zdump.000.) was sent immediately to hostd;
In(14) vobd[2097956]  [VMCorrelator] 688058624706us: [esx.problem.vm.kill.unexpected.fault.failure.2] /vmfs/volumes/########-af1####-8980-#######/VMName/VMName.vmx could not fault in a guest physical page from the hypervisor level swap file on ########-af1####-8980-#######. The VM is terminated as further progress is impossible.
In(14) vobd[2097956]  The event ([esx.problem.vm.kill.unexpected.fault.failure.2] /vmfs/volumes/########-af1####-8980-#######/VMName/VMName.vmx could not fault in a guest physical page from the hypervisor level swap file on ########-af1####-8980-#######. The VM is terminated as further progress is impossible.) was sent immediately to hostd;
In(14) vobd[2097956]  No correlator for vob.vm.kill.panic


var/run/log/vmkernel.log

In(182) vmkernel: cpu87:2097821)BC: 608: write to VMName.scoreboard (f530 28 3 62a22db0 af1#### #### 550310b6 dc04a44 80 0 0 0 0 0) 4096 bytes failed: No connection
Wa(180) vmkwarning: cpu62:2114931)WARNING: World: vm 21####: 9030: vmm0:VMName:vmk: vcpu-0:Unable to read swapped out BPN(0x40007aa220) from swap slot(0x397337) for VM(21####)
In(182) vmkernel: cpu28:2115092)UserDump: 3157: vmx-vcpu-0:VMName: Dumping cartel 2114902 (from world 2115092) to file /var/core/vmx-zdump.000 ...
In(182) vmkernel: cpu28:2115092)UserDump: 3452: vmx-vcpu-0:VMName: Userworld(vmx-vcpu-0:VMName) coredump complete.
In(182) vmkernel: cpu80:2114902)BC: 608: write to VMName.scoreboard (f530 28 3 62a22db0 af1#### #### 550310b6 dc04a44 80 0 0 0 0 0) 4096 bytes failed: No connection
In(182) vmkernel: cpu80:2114902)BC: 608: write to VMName.scoreboard (f530 28 3 62a22db0 af1#### #### 550310b6 dc04a44 80 0 0 0 0 0) 4096 bytes failed: No connection
Al(177) vmkalert: cpu80:2114902)ALERT: BC: 3042: File VMName.scoreboard closed with dirty buffers. Possible data loss.
Wa(180) vmkwarning: cpu2:2097477)WARNING: SwapExtend: vm 2097477: 435: Failed to truncate swapfile /vmfs/volumes/#######-af1####-8980-#######/VMName/VMName-05319e9a.vswp to 0 bytes: No connection
Wa(180) vmkwarning: cpu2:2097477)WARNING: SwapFileOps: 601: Failed to unlink /vmfs/volumes/#######-af1####-8980-#######/VMName/VMName-05319e9a.vswp: No connection
Wa(180) vmkwarning: cpu2:2097477)WARNING: SwapExtend: vm 2097477: 435: Failed to truncate swapfile /vmfs/volumes/#######-af1####-8980-#######/VMName/vmx-VMName-6576399###############eeec29$
Wa(180) vmkwarning: cpu2:2097477)WARNING: SwapFileOps: 601: Failed to unlink /vmfs/volumes/#######-af1####-8980-#######/VMName/vmx-VMName-6576399###########eeec29e73d373-1.vswp: No con$

Resolution

Involve storage vendor to investigate on the "APD" reported.