Monitoring Aria Automation image based backups for proper quiesced state.
search cancel

Monitoring Aria Automation image based backups for proper quiesced state.

book

Article ID: 379005

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Monitoring Aria Automation image based backups for proper quiesced state

Log Message extraction using various methods

Environment

Aria Automation 8.x

Resolution

Introduction

 

Guidelines for backing up VMware Aria Automation using image/snapshot based backup system. 

  • When you back up a complete system, back up all instances of VMware Aria Automation, and VMware Workspace ONE Access appliance as near simultaneously as possible, preferably within seconds.  The vCenter snapshot task is the first step in the sequence. This snapshot task needs to be scheduled for all 3 nodes. It may be initiated via a 3rd party backup tool or directly via vCenter.
  • Minimize the number of active transactions before you begin a backup. Schedule your regular backup to when your system is least active.
  • When you back up the VMware Aria Automation appliance, disable in-memory snapshots.
  • Enable quiescing (for versions 8.9 and onwards).
  • Create a backup of instances of the VMware Aria Automation appliance when you update the certificates.
  • Make sure you start the snapshots of the second and the third node no more than 40 seconds after the snapshot of the first node starts. After the snapshot task is initiated, VMware Tools initiates a "freeze" script to quiesce the file system across the 3 nodes before the actual snapshot is taken on all 3 nodes.
  • Once the 40 seconds have past, the journal log is updated with status entries. If the quiesced state was not achieved for all 3 nodes, the following 2 outputs will be in the logs: "Freeze synchronization failed" and "Sync failed, making inconsistent snapshot".
  • Do not wait for a snapshot of a node to complete, before you snapshot the next node.
  • Verify if all snapshots are created. 

Note: Main purpose of this document is to validate that the snapshots are synchronized across all the nodes. Currently snapshots will be created successfully even if snaps were not done within the time limit of ~40 seconds

Link to documentation VMware Aria Automation 8.x Preparations for Backing Up

Image above shows high level diagram for Aria Automation snapshot quiesce backup  

 

The Aria Automation Appliance has VMware Tools installed and configured with “vmbackup” enabled and is required to execute freeze scripts.    Additional information can be found in these documentation Links VMware Tools Services , Exclude Specific File Systems from Quiesced Snapshots and Enabling Quiescing for Linux VMs



Aria Automation VMware Tools Configuration and Freeze scripts 

 

  1. When a snapshot is invoked with quiesce and the /etc/vmware-tools/tools.conf file has below set:

 

           [vmbackup]

            execScripts=true

 

Note: With this enabled any scripts in the directory /etc/vmware-tools/backupScripts.d  will be invoked.



  1. On the Aria Appliances the out of the box script is

          90-freeze-data -> /opt/scripts/freezer-control.py

 

 Note: The freezer-control.py then will call freezer-server.py which will validate the snapshot consistency across all nodes. If all nodes freeze within the allotted time then the script returns 0 for success and a snapshot with quiesce will execute.  If Sync fails then the script will return an exit code of 0 and the freeze will end and the snapshot will still be taken.  Currently the script only logs errors and does not return a failed freeze when a “failed sync” occurs, however the errors are recorded in the journal logs.  In this document, we will show how to extract these logs from the appliances or logging system. 

 

    

Simple flow diagram showing if the freeze script exits with 0 or 1. 

The images Below show timelines of real examples of snapshot execution for Aria Automation nodes.

Validating Aria Automation Freeze Sync

 

There are a few ways to get the logging information from Aria Automation appliances either directly from the appliances journal logs or from a logging application like VMware Log Insights where the logs are forwarded.  

 

  1. Below I describe three different scripts that use VM Tools to invoke script action on Aria Automation nodes to extract journal logs.  The scripts execute this command to see the logs on the nodes console: 

journalctl --identifier=vmtoolsd --no-pager --output=json --since='2024-05-23 17:25:00’ “ 

 

Note: The only parameter we are passing into this command is the since date which pulls the data from that start time till current time.  The returned message are then searched for the messages:

          freeze synchronization failed

          sync failed, making inconsistent snapshot

 

2.  Query Log Insight using python scripts  

 

Note: These scripts then can be modified and plugged into a backup solution to provide a retry method  or mark the backup image as failed sync and can be found in the zip file Query for Aria Automation Journal logs examples.zip

Guest Scripting (Powershell, Python & vRO)

Note: These are example scripts to extract the log message directly from the Aria automation appliances

 

  • getvRABackupMessages.py

This script is written in python and uses the pypi pyvmomi module to connect to the vCenter and then StartProgramInGuest executes command line above, then extracts the required logs to evaluate in the script.

  • getvRABackupMessages.ps1

This script is written in powershell and uses the module powercli to connect to the vCenter and then execute Invoke-VMScript 

  • “Get Logs from vRA Node” vRO workflow in package “com.vmware.pve.getJournalLogs.package”

This script is a vRO workflow and uses the Guest Script Manager workflow “Run Script in Guest”




  1. Currently the getvRABackupMessages.py script has a section in the main function where the credentials and parameters are passed in.  I have not yet turned this into a command line script, and was not sure where this script would be plugged into.



   args = {

        "host": "vcenterhost",                              # vCenter FQDN

        "user": "[email protected]",             # vCenter Username

        "password": "password",                         # vCenter Password

        "port": 443,                                              # vCenter Port

        "disable_ssl_verification": True,              # for self signed certs

        "guestUsername": "root",                        # guest vm username

        "guestPassword": "password",                # guest vm password

        "vmname": "vRA vm name",                    # vRA vm name

        "startDate": "2024-05-29 13:00:00"          # Date/time snapshot was invoked

    }

 

 

2.  The the getvRABackupMessages.ps1 script input data can have the script modified and the parameters be filled in as shown below or you can uncomment the values below like $vcHost = $args[0] and then these values can be part of a command line execution.

 

# vCenter FQDN

$vcHost = ""

# $vcHost = $args[0]

# vCenter Username

$vcUsername = ""

# $vcUsername = $args[1]

# vCenter Password

$vcPassword = ""

# $vcPassword = $args[2]

# Aria VM name

$vraHost = ""

# $vraHost = $args[3]

# Aria VM Guest Username

$guestUser = ""

# $guestUser = $args[4]

# Aria VM Guest Password

$guestPassword = ""

# $guestPassword = $args[5]

# Start Time format structure "2024-05-23 14:20:00" 

# This should be the time snapshot started

$startTime = ""

# $startTime = $args[6]



3. The vRO workflow “get Logs from vRA Node” has a configuration component included in the package called ScriptConfig that needs to be configured with the Guest Username and Guest password so a script can be invoked in the OS.

Also the vCenter Server needs to be added to vRO using the “Add a vCenter Server instance” workflow.

and have it show up in the inventory section.

The script to invoke the query requires two inputs vmname and startdate.

Log Insight

    Aria Automation Integration

You will need to ssh into one of the Aria Automation nodes then to validate Log Insight integration at the command prompt

vracli vrli

Output Example:

$ vracli vrli

{

    "agentId": "0",

    "environment": "prod",

    "host": "fqdn.xxx.local",

    "port": 9543,

    "scheme": "https",

    "sslVerify": false

}

 

If the command does not return a configuration then here is a link to Documentation to configure: Configure log forwarding to Log Insight

    Direct API Scripting query into Log Insight

Currently the getvRALogsFromLi.py script has a section in the main function where the credentials and parameters are passed in.  Examples of the values that need to be replaced are below. I have not yet turned this into a command line script, and was not sure where this script would be executed.  This script uses a python request module to talk to Aria Log Insight to extract logs for a given vranode and a date range.  Was not able to find a way to extract the data without authentication so you will need to assign a user permissions to authenticate with Log Insight most likely using vIDM as the provider.  

args = {

         # Log Insight Host & Port you will need to identify which log insight host the vRA nodes are pointed to 

         "host": "fqdn.com",

         "port": 9543,

         # Log Insight Service account & Password for backup service

         "username": "admin",

         "password": "VMware1!",

"provider": "Local",      # The provider can be  "Local","ActiveDirectory" or "vIDM"

# Aria Automation Node

         "vranode": "fqdn.local",

# Date Range to search

         "starttime_str": "2024-06-18 12:25:28",

         "endtime_str": "2024-06-18 12:30:29"

     }

 

Troubleshooting log messages 

 

SSH into one of the vRA nodes with root access

  1. At the prompt execute "vracli vrli" to validate where the logs are being sent to.  Output from this command should be something like this:

{

    "agentId": "0",

    "bufferFlushThreadCount": 16,

    "environment": "cavadev",

    "host": "fqdn.com",

    "port": 9543,

    "requestHttpCompress": false,

    "requestImmediateRetries": 3,

    "requestMaxSize": 256000,

    "requestTimeout": 30,

    "scheme": "https",

    "sslVerify": false

}

2.  Then at the prompt execute "journalctl --identifier=vmtoolsd --no-pager -f"

This is basically tailing the journalctl logs

 

3.  You will also want to open a browser and log into your vRLI that was displayed in the output of "vracli vrli" https://fqdn.com

    1. Go to the explore logs tab and setup filter as shown changing the hostname to your vRA node.

4.  Manually Create a Snapshot with Quiesce in vCenter

      1. Log into vCenter UI
      2. Select the same vRA node that you are SSH into
      3. Create Snapshot similar to what is shown

5.  Once you click on create for the Snapshot above then at the SSH prompt where you are monitoring the journalctl logs you should see logs similar to something like this:

Example: of a successful Freeze Sync:

Example: of a failed Freeze Sync:

6.  Log Insight output should match what is being shown in the SSH session similar to the screenshot below