Virtual Machine File Lock or Hang During Snapshot Operations Related to I/O Filters
search cancel

Virtual Machine File Lock or Hang During Snapshot Operations Related to I/O Filters

book

Article ID: 422754

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

In VMware vSphere environments, a Virtual Machine (VM) may become unresponsive or "hangs" during or after a snapshot or other checkpoint operations. This often results in a "Caught signal 15" error, leaving the VM in a state where it cannot be powered off, migrated, or have its lock files removed through standard management tasks. In some scenarios, an ESXi host reboot is required to clear the stale file locks.

Symptoms include:

  • Failures when attempting to remove VM lock files.
  • Snapshot operations (create/remove) failing or causing the VM to stop responding.
  • Error logs referencing I/O filter (iofilter) timeouts or caught signals.
  • VM management tasks (vMotion, Power Off) resulting in errors.

Errors include:

Within: /var/run/log/veecdp.log:

<date/time> veecdp[]: [] 756 (E) {} [] [TcpConnector] [:33039] Connection attempt failed with error 111
<date/time> Er(155) veecdp[]: [] 756 (E) {} [] [Transceiver] [CMD] Connecting failed, self: 0xb..., error: 111
<date/time> Er(155) veecdp[]: [] 756 (E) {} [] [Transceiver] [CMD] Failed connect to daemon, self: 0xb..., error: 104,'o', reconnect in 5 second(s)
....
<date/time> Er(155) veecdp[]: [] 252 (E) {} [] [CMD] Failed to shutdown command stream gracefully with 11
<date/time> Er(155) veecdp[]: [] 252 (E) Sidecar attribute delivery-map-data found with len 10240, cannot store data with len 19200.
....
<date/time> Er(155) veecdp[]: [] 339 (E) {} [] [CMD] Failed to shutdown command stream gracefully with 11
<date/time> Er(155) veecdp[]: [] 339 (E) [Timer] Failed to remove timer, this: 0xb..., status 'Object not found'
<date/time> Er(155) veecdp[]: [] 339 (E) Sidecar attribute dirty-map-data found with len 15360, cannot store data with len 32000.
<date/time> Er(155) veecdp[]: [] 345 (E) [Timer] Failed to remove timer, this: 0xb..., status 'Object not found'
 
Within the vmware.log:
 
<date/time> In(05) vcpu-0 - DISKLIB-CBT : ChangeTrackerESX_DestroyMirror: Destroyed mirror node <>-cbtmirror. SrcFd: /vmfs/volumes/<guid>/<vm>/<vm>-ctk.vmdk, DestFd: /vmfs/volumes/<guid>/<vm>/<vm>-ctk-mirror.vmdk.

<date/time> Wa(03) vmx - Caught signal 15 -- tid 219... (eip 0xb...)

Cause

The issue is often triggered by I/O filters (such as Veeam CDP iofilters). If the filter encounters an error or if there are residual filter references within the .vmdk descriptor files that do not match the cluster configuration, the VM process may hang while attempting the snapshot or other checkpoint operations.

Resolution

To resolve, verify the I/O filter configuration at both the VM and Host level:

  1. Check VMDK Descriptor Files:
    1. Access the VM's directory on the datastore.
    2. Examine the .vmdk files to see if there are residual lines referencing I/O filters (e.g., veecdp) that are no longer in use or are misconfigured.
    3. Remove the specific lines referencing the defunct filters from the descriptor files.
  2. Verify Host I/O Filter Status:
    1. Check the ESXi hosts within the cluster to ensure I/O filters are properly installed and at the expected version.
    2. In some cases, if the features are no longer required, removing the I/O filters from the cluster and the individual hosts will resolve the conflict.