Windows BSOD on Backup Agent VM Triggered by Unexpected Disk Removal During Backup Tasks
search cancel

Windows BSOD on Backup Agent VM Triggered by Unexpected Disk Removal During Backup Tasks

book

Article ID: 429103

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

A Windows-based virtual machine (VM) acting as a Veeam Backup Agent (Proxy) experiences an intermittent Blue Screen of Death (BSOD).
The crash is directly correlated with the completion of a backup task when the proxy attempts to release a HotAdd disk.

Similar logs in vmware.log:
YYYY-MM-DD HH:MM:SS In(05) vmx - HotAdd: Adding scsi-hardDisk with mode 'independent-nonpersistent' to scsi0:2
YYYY-MM-DD HH:MM:SS In(05) vmx - HotAdd: Adding scsi-hardDisk with mode 'independent-nonpersistent' to scsi0:3
YYYY-MM-DD HH:MM:SS In(05) vmx - VigorTransportProcessClientPayload: opID=####### seq=####: Receiving Disk.SetPresent request.
YYYY-MM-DD HH:MM:SS In(05) vmx - HotAdd: Adding scsi-hardDisk with mode 'independent-nonpersistent' to scsi0:4
YYYY-MM-DD HH:MM:SS In(05) vmx - DISK: OPEN scsi0:4 '/vmfs/volumes/###########/<VM-01>/VM-01_1.vmdk' independent-nonpersistent R[]
......

YYYY-MM-DD HH:MM:SS Wa(03) vcpu-6 - WinBSOD: Synthetic MSR[0x40000100] 0x7e
YYYY-MM-DD HH:MM:SS Wa(03)+ vcpu-6 -
YYYY-MM-DD HH:MM:SS Wa(03) vcpu-6 - WinBSOD: Synthetic MSR[0x40000101] 0xffffffffc0000005
YYYY-MM-DD HH:MM:SS Wa(03)+ vcpu-6 -
YYYY-MM-DD HH:MM:SS Wa(03) vcpu-6 - WinBSOD: Synthetic MSR[0x40000102] 0xfffff80a7098163a
YYYY-MM-DD HH:MM:SS Wa(03)+ vcpu-6 -
YYYY-MM-DD HH:MM:SS Wa(03) vcpu-6 - WinBSOD: Synthetic MSR[0x40000103] 0xffff93861bb86b88
YYYY-MM-DD HH:MM:SS Wa(03)+ vcpu-6 -
YYYY-MM-DD HH:MM:SS Wa(03) vcpu-6 - WinBSOD: Synthetic MSR[0x40000104] 0xffff93861bb863d0
YYYY-MM-DD HH:MM:SS Wa(03)+ vcpu-6 -
YYYY-MM-DD HH:MM:SS In(05) svga - SWBScreen: Screen 1 Destroyed: user-##(0, 0, 1024, 768) flags=0x2
YYYY-MM-DD HH:MM:SS In(05) svga - SVGA-ScreenMgr: Screen type changed to RegisterMode
YYYY-MM-DD HH:MM:SS In(05) svga - SWBScreen: Screen 1 Defined: user-##(0, 0, 1024, 768) flags=0x2
YYYY-MM-DD HH:MM:SS In(05) vcpu-6 - LSI: Invalid PageType [21] pageNo 0 Action 0
YYYY-MM-DD HH:MM:SS In(05) vmx - GuestRpcSendTimedOut: message to toolbox timed out.
YYYY-MM-DD HH:MM:SS In(05) vmx - Tools: [AppStatus] Last heartbeat value 452131 (last received 6s ago)
YYYY-MM-DD HH:MM:SS In(05) vmx - TOOLS: appName=toolbox, oldStatus=1, status=2, guestInitiated=0.
YYYY-MM-DD HH:MM:SS In(05) vcpu-0 - Tools: Tools heartbeat timeout.
YYYY-MM-DD HH:MM:SS In(05) vcpu-0 - Tools: Running status rpc handler: 1 => 0.
YYYY-MM-DD HH:MM:SS In(05) vcpu-0 - Tools: Changing running status: 1 => 0.
YYYY-MM-DD HH:MM:SS In(05) vcpu-0 - Tools: [RunningStatus] Last heartbeat value 452131 (last received 20s ago)
YYYY-MM-DD HH:MM:SS In(05) vcpu-0 - TOOLS hostVerifiedSamlToken capability set to FALSE.
YYYY-MM-DD HH:MM:SS In(05) vmx - GuestRpcSendTimedOut: message to toolbox timed out.
YYYY-MM-DD HH:MM:SS In(05) vmx - GuestRpc: app toolbox's second ping timeout; assuming app is down
YYYY-MM-DD HH:MM:SS In(05) vmx - Tools: [AppStatus] Last heartbeat value 452131 (last received 26s ago)
YYYY-MM-DD HH:MM:SS In(05) vmx - TOOLS: appName=toolbox, oldStatus=2, status=0, guestInitiated=0.
YYYY-MM-DD HH:MM:SS In(05) vmx - GuestRpc: Reinitializing Channel 0(toolbox)
YYYY-MM-DD HH:MM:SS In(05) vmx - GuestMsg: Channel 0, Cannot unpost because the previous post is already completed
YYYY-MM-DD HH:MM:SS In(05) vmx - Tools: [AppStatus] Last heartbeat value 452131 (last received 26s ago)
YYYY-MM-DD HH:MM:SS In(05) vmx - TOOLS: appName=toolbox, oldStatus=0, status=0, guestInitiated=0.

YYYY-MM-DD HH:MM:SS  In(05) svga - SWBScreen: Screen 1 Destroyed: user-##(0, 0, 1024, 768) flags=0x2
YYYY-MM-DD HH:MM:SS  In(05) vcpu-0 - Destroying virtual dev for scsi0:0 vscsi=14701956522063397
YYYY-MM-DD HH:MM:SS  In(05) vcpu-0 - Destroying virtual dev for scsi0:1 vscsi=14701956526257702
YYYY-MM-DD HH:MM:SS In(05) vcpu-0 - Destroying virtual dev for scsi0:2 vscsi=14701956530452949
YYYY-MM-DD HH:MM:SS  In(05) vcpu-0 - Destroying virtual dev for scsi0:3 vscsi=14701956534647254
YYYY-MM-DD HH:MM:SS  In(05) vcpu-0 - Destroying virtual dev for scsi0:4 vscsi=14701956538841559
YYYY-MM-DD HH:MM:SS  In(05) svga - SWBScreen: Screen 1 Destroyed: user-##(0, 0, 1024, 768) flags=0x2
YYYY-MM-DD HH:MM:SS  In(05) svga - SWBScreen: Screen 1 Destroyed: user-##(0, 0, 1024, 768) flags=0x2
YYYY-MM-DD HH:MM:SS  In(05) svga - SWBScreen: Screen 0 Destroyed: user-##(0, 0, 640, 480) flags=0x3
YYYY-MM-DD HH:MM:SS  In(05) svga - SWBScreen: Screen 1 Destroyed: user-##(0, 0, 1024, 768) flags=0x2
YYYY-MM-DD HH:MM:SS  In(05) svga - SWBScreen: Screen 1 Destroyed: user-##(0, 0, 1024, 768) flags=0x2
YYYY-MM-DD HH:MM:SS  In(05) vcpu-0 - Destroying virtual dev for scsi0:3 vscsi=14701956534647259
YYYY-MM-DD HH:MM:SS  In(05) vcpu-0 - Destroying virtual dev for scsi0:4 vscsi=14701956538841564
YYYY-MM-DD HH:MM:SS  In(05) vcpu-0 - Destroying virtual dev for scsi0:2 vscsi=14701956530452954

 

 

Cause

This is a Microsoft Windows Known issue related to the Partition Manager.

The issue is non-deterministic (a race condition). It occurs when the timing of a disk removal overlaps perfectly with a background telemetry or partition sensing task. Because a Veeam Backup Proxy's primary role is to mount and unmount disks frequently, the probability of hitting this race condition is significantly higher than on a standard server.

Resolution

To engage Microsoft Support, provide the memory dump to Microsoft Support to confirm the specific race condition in partmgr.sys.

Additional Information

https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x7e--system-thread-exception-not-handled