Monitoring Aria Automation image based backups for proper quiesced state
Log Message extraction using various methods
Aria Automation 8.x
Guidelines for backing up VMware Aria Automation using image/snapshot based backup system.
Note: Main purpose of this document is to validate that the snapshots are synchronized across all the nodes. Currently snapshots will be created successfully even if snaps were not done within the time limit of ~40 seconds
Link to documentation VMware Aria Automation 8.x Preparations for Backing Up
Image above shows high level diagram for Aria Automation snapshot quiesce backup
The Aria Automation Appliance has VMware Tools installed and configured with “vmbackup” enabled and is required to execute freeze scripts. Additional information can be found in these documentation Links VMware Tools Services , Exclude Specific File Systems from Quiesced Snapshots and Enabling Quiescing for Linux VMs
Aria Automation VMware Tools Configuration and Freeze scripts
[vmbackup]
execScripts=true
Note: With this enabled any scripts in the directory /etc/vmware-tools/backupScripts.d will be invoked.
90-freeze-data -> /opt/scripts/freezer-control.py
Note: The freezer-control.py then will call freezer-server.py which will validate the snapshot consistency across all nodes. If all nodes freeze within the allotted time then the script returns 0 for success and a snapshot with quiesce will execute. If Sync fails then the script will return an exit code of 0 and the freeze will end and the snapshot will still be taken. Currently the script only logs errors and does not return a failed freeze when a “failed sync” occurs, however the errors are recorded in the journal logs. In this document, we will show how to extract these logs from the appliances or logging system.
Simple flow diagram showing if the freeze script exits with 0 or 1.
The images Below show timelines of real examples of snapshot execution for Aria Automation nodes.
Validating Aria Automation Freeze Sync
There are a few ways to get the logging information from Aria Automation appliances either directly from the appliances journal logs or from a logging application like VMware Log Insights where the logs are forwarded.
“journalctl --identifier=vmtoolsd --no-pager --output=json --since='2024-05-23 17:25:00’ “
Note: The only parameter we are passing into this command is the since date which pulls the data from that start time till current time. The returned message are then searched for the messages:
freeze synchronization failed
sync failed, making inconsistent snapshot
2. Query Log Insight using python scripts
Note: These scripts then can be modified and plugged into a backup solution to provide a retry method or mark the backup image as failed sync and can be found in the zip file Query for Aria Automation Journal logs examples.zip
Note: These are example scripts to extract the log message directly from the Aria automation appliances
This script is written in python and uses the pypi pyvmomi module to connect to the vCenter and then StartProgramInGuest executes command line above, then extracts the required logs to evaluate in the script.
This script is written in powershell and uses the module powercli to connect to the vCenter and then execute Invoke-VMScript
This script is a vRO workflow and uses the Guest Script Manager workflow “Run Script in Guest”
args = {
"host": "vcenterhost", # vCenter FQDN
"user": "[email protected]", # vCenter Username
"password": "password", # vCenter Password
"port": 443, # vCenter Port
"disable_ssl_verification": True, # for self signed certs
"guestUsername": "root", # guest vm username
"guestPassword": "password", # guest vm password
"vmname": "vRA vm name", # vRA vm name
"startDate": "2024-05-29 13:00:00" # Date/time snapshot was invoked
}
2. The the getvRABackupMessages.ps1 script input data can have the script modified and the parameters be filled in as shown below or you can uncomment the values below like $vcHost = $args[0] and then these values can be part of a command line execution.
# vCenter FQDN
$vcHost = ""
# $vcHost = $args[0]
# vCenter Username
$vcUsername = ""
# $vcUsername = $args[1]
# vCenter Password
$vcPassword = ""
# $vcPassword = $args[2]
# Aria VM name
$vraHost = ""
# $vraHost = $args[3]
# Aria VM Guest Username
$guestUser = ""
# $guestUser = $args[4]
# Aria VM Guest Password
$guestPassword = ""
# $guestPassword = $args[5]
# Start Time format structure "2024-05-23 14:20:00"
# This should be the time snapshot started
$startTime = ""
# $startTime = $args[6]
3. The vRO workflow “get Logs from vRA Node” has a configuration component included in the package called ScriptConfig that needs to be configured with the Guest Username and Guest password so a script can be invoked in the OS.
Also the vCenter Server needs to be added to vRO using the “Add a vCenter Server instance” workflow.
and have it show up in the inventory section.
The script to invoke the query requires two inputs vmname and startdate.
You will need to ssh into one of the Aria Automation nodes then to validate Log Insight integration at the command prompt
vracli vrli
Output Example:
$ vracli vrli
{
"agentId": "0",
"environment": "prod",
"host": "fqdn.xxx.local",
"port": 9543,
"scheme": "https",
"sslVerify": false
}
If the command does not return a configuration then here is a link to Documentation to configure: Configure log forwarding to Log Insight
Currently the getvRALogsFromLi.py script has a section in the main function where the credentials and parameters are passed in. Examples of the values that need to be replaced are below. I have not yet turned this into a command line script, and was not sure where this script would be executed. This script uses a python request module to talk to Aria Log Insight to extract logs for a given vranode and a date range. Was not able to find a way to extract the data without authentication so you will need to assign a user permissions to authenticate with Log Insight most likely using vIDM as the provider.
args = {
# Log Insight Host & Port you will need to identify which log insight host the vRA nodes are pointed to
"host": "fqdn.com",
"port": 9543,
# Log Insight Service account & Password for backup service
"username": "admin",
"password": "VMware1!",
"provider": "Local", # The provider can be "Local","ActiveDirectory" or "vIDM"
# Aria Automation Node
"vranode": "fqdn.local",
# Date Range to search
"starttime_str": "2024-06-18 12:25:28",
"endtime_str": "2024-06-18 12:30:29"
}
SSH into one of the vRA nodes with root access
{
"agentId": "0",
"bufferFlushThreadCount": 16,
"environment": "cavadev",
"host": "fqdn.com",
"port": 9543,
"requestHttpCompress": false,
"requestImmediateRetries": 3,
"requestMaxSize": 256000,
"requestTimeout": 30,
"scheme": "https",
"sslVerify": false
}
2. Then at the prompt execute "journalctl --identifier=vmtoolsd --no-pager -f"
This is basically tailing the journalctl logs
3. You will also want to open a browser and log into your vRLI that was displayed in the output of "vracli vrli" https://fqdn.com
4. Manually Create a Snapshot with Quiesce in vCenter
5. Once you click on create for the Snapshot above then at the SSH prompt where you are monitoring the journalctl logs you should see logs similar to something like this:
Example: of a successful Freeze Sync:
Example: of a failed Freeze Sync:
6. Log Insight output should match what is being shown in the SSH session similar to the screenshot below