Linux VM with heavy IO load may hang during the process of quiesced snapshot.
search cancel

Linux VM with heavy IO load may hang during the process of quiesced snapshot.

book

Article ID: 421195

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • LINUX VM hung due to heavy IO load during the quiesced snapshot process.

  • Within the Guest OS /var/log/vmware-vmsvc-root.#.log

    • [YYYY-MM-DD-HH:SS] [ warning] [vmbackup] [1154] Failed to send vmbackup event: vmbackup.eventSet req.keepAlive 0 , result: Unknown command.
      [YYYY-MM-DD-HH:SS] [ warning] [vmbackup] [1154] Canceling backup operation due to timeout.
      [YYYY-MM-DD-HH:SS] [ warning] [vmbackup] [1154] Failed to send vmbackup event: vmbackup.eventSet req.aborted 4 Quiesce canceled., result: Unknown command.
      [YYYY-MM-DD-HH:SS] [ warning] [vmbackup] [1154] Failed to send vmbackup event: vmbackup.eventSet req.done 0 , result: Unknown command.
      [YYYY-MM-DD-HH:SS] [ warning] [vmsvc] [3546888] SyncDriver: '/db-data' appears locked or frozen by another process.  Cannot complete the quiesced snapshot request.
      [YYYY-MM-DD-HH:SS] [ warning] [vmbackup] [3546888] Error trying to perform OP_FREEZE on filesystems.
      [YYYY-MM-DD-HH:SS] [ warning] [vmsvc] [3546889] SyncDriver: '/db-data' appears locked or frozen by another process.  Cannot complete the quiesced snapshot request.
      [YYYY-MM-DD-HH:SS] [ warning] [vmbackup] [3546889] Error trying to perform OP_FREEZE on filesystems.
      [YYYY-MM-DD-HH:SS] [ warning] [vmsvc] [3546891] SyncDriver: '/db-data' appears locked or frozen by another process.  Cannot complete the quiesced snapshot request.
      [YYYY-MM-DD-HH:SS] [ warning] [guestinfo] [1154] *** WARNING: GuestInfo collection interval longer than expected; actual=1109 sec, expected=30 sec. ***

  • From the ESXi host /vmfs/volumes/<datastore>/<vm-name>/vmware.log files, similar below entries would be seen

    • YYYY-MM-DD-HH:SS In(05) vmx 5127ee63-59-91d2 SNAPSHOT: SnapshotPrepareTakeDoneCB: Prepare phase complete (The operation completed successfully).
      YYYY-MM-DD-HH:SS In(05) vcpu-5 - ToolsBackup: changing quiesce state: IDLE -> STARTED
      YYYY-MM-DD-HH:SS In(05) vmx 5127ee63-59-91d2 Msg_Post: Warning
      YYYY-MM-DD-HH:SS In(05) vmx 5127ee63-59-91d2 [msg.snapshot.quiesce.timeout] Timed out while quiescing the virtual machine.
      YYYY-MM-DD-HH:SS In(05) vmx 5127ee63-59-91d2 ----------------------------------------
      YYYY-MM-DD-HH:SS In(05) vmx - ToolsBackup: changing quiesce state: STARTED -> DONE

Cause

VMTools vmbackup plugin will try to freeze the filesystem for a quiesced snapshot. However, when the VM is heavily IO loaded (db operation etc.) the vmtools freeze operation which is in a separate thread pool may fail and timeout after default 15mins and there are some vmtools warning/error log that could not be written to the already frozen file system.

Resolution

Broadcom Engineering is actively working towards fixing it in future VMware Tools version. Subscribe to receive an email when the article is updated. 
 
Workaround: 

The following two methods should be done together to mitigate this issue.
  1. Using customized freeze/thaw scripts to stop any workloads that with heavy IO. e.g. the PostgreSQL, antivirus.
    examples: 
    • Note: Save the script in this location "/etc/vmware-tools/backupScripts.d/pre-freeze-post-thaw-linux.sh"

    • vi /etc/vmware-tools/backupScripts.d/pre-freeze-post-thaw-linux.sh

    • Copy the below contents to the pre-freeze-post-thaw-linux.sh file
    • #!/bin/sh
      # Copyright 2022 VMware, Inc. All rights reserved. -- VMware Confidential
      # Description: Pre-Freeze / Post-Thaw script for quiesced backups.
      #
      # Copy to /etc/vmware-tools/backupScripts.d/ on the guest.
      # During quiesced backup all scripts in this directory are called by VM tools.
      # Script is invoked with a single parameter freeze|thaw|freezeFail.
      # This script just logs invocation time into /var/log/freeze.log
      # Log time is UTC, example of timestamp: 2022-10-26T17:55:13Z

      log="/var/log/freeze.log"
      today=`date -u +%Y-%m-%dT%H:%M:%SZ`
      if [[ $1 == "freeze" ]]
      then
       echo "${today}: This is quiescing freeze script" >> ${log}
      elif [[ $1 == "thaw" ]]
      then
       echo "${today}: This is quiescing thaw script" >> ${log}
      elif [[ $1 == "freezeFail" ]]
      then
       echo "${today}: freezeFail - Quiescing failed" >> ${log}
      else
       echo "No argument was provided"
      fi
  2. Redirect the vmtools logging to host, or disable the tools log (not recommended for debugging)

  3. To redirect the tools log to host, edit the "/etc/vmware-tools/tools.conf"