Clearing the Alerts and Alarms Tables in Aria Operations

Products

VMware Aria Suite

Issue/Introduction

Symptoms:

After selecting multiple alerts in VMware Aria Operations (formerly known as vRealize Operations) and clicking Cancel Alert, one or more of the selected alerts remains active.
An error states that a few alerts which were deleted may still be present.
The cluster suffers performance issues while a high number of historical alerts and/or alarms are present.

Environment

Aria Operations 8.x

Resolution

Before proceeding we recommend taking snapshot of all cluster nodes. See How to take a Snapshot of Aria Operations for more information.

To resolve the issue, delete alerts directly from the database, or clear the tables.

Log into the primary node as root via SSH or Console, pressing ALT+F1 in a Console to log in.
Launch the psql utility and connect to the vPostgres database by running this command:

su - postgres -c "/opt/vmware/vpostgres/current/bin/psql -d vcopsdb -p 5432"

Note: The prompt changes to vcopsdb=> when the command is completed.

Choose option 1 or 2, or 3 depending on what you would like to do:

Note: The alarms table correlates to Symptoms, and the alerts table correlates to Alerts in the Aria Operations UI.

Option 1: Clear all alerts and alarms

truncate table alert cascade;
truncate table alarm cascade;

Note: The above commands clear all of the symptoms and alerts present on the system. After a collection cycle or two, the alerts with present and active symptoms will be re-triggered. If alerts are emailed out via notifications or sent to a ticketing system, it is recommended to disabled those configurations temporarily until the alerts settle.

Option 2: Clear alerts and alarms related to a specific resource

delete from alert where resource_id = resource_id;
delete from alarm where resource_id = resource_id;

Note: The resource_id value can be obtained from the Administration > Inventory page, click on the Show Columns icon in the bottom and select Internal ID. Filter for the object and use the Internal ID as the resource_id value.

Option 3: Clear alerts and alarms based on time

In order to delete alerts / alarms older than a certain time we need to convert UTC time to UNIX epoch time. Use a time converter or online resource to convert time (ignore milliseconds).

We need to decide whether to delete alerts / alarms which were created the earliest, or which alerts / alarms were most recently updated. Using the updated time may make the most sense since those alerts / alarms are no longer present or active. If you simply went by the created time, you may delete alerts / alarms which are still active and triggering.

So once you decide on a date you wish to make your cutoff, we can then create a delete from command. For example, if we want to delete any alert which hasn't been updated for more than three months, we would use epoch time 1732500966 which is equivalent to Monday, November 24th at 19:16:06 GMT+0000.

Here are example commands we would run to delete alerts and alarms which started on or before November 24th, 2024 (we use the epoch time converter to get 1732500966):

delete from alert where start_time_utc <= '1732500966';
delete from alarm where start_time_utc <= '1732500966';

Or:

delete from alert where update_time_utc <= '1732500966';
delete from alarm where update_time_utc <= '1732500966';

Or:

delete from alert where cancel_time_utc <= '1732500966';
delete from alarm where cancel_time_utc <= '1732500966';

Or run this from a root command prompt without logging into the database. Much quicker when working with multiple systems.

root@vrops [ ~ ]# su - postgres -c "/opt/vmware/vpostgres/current/bin/psql -d vcopsdb -A -t -c 'delete from alert where update_time_utc <= '1732500966';'"
root@vrops [ ~ ]# su - postgres -c "/opt/vmware/vpostgres/current/bin/psql -d vcopsdb -A -t -c 'delete from alarm where update_time_utc <= '1732500966';'"

Here are the three alert/alarm time criteria we could choose from for our WHERE clause:

where start_time_utc
where update_time_utc
where cancel_time_utc

Type \q to exit the psql utility.
Repeat steps 1 to 4 on all other analytic (primary, replica, and data) nodes in the cluster.
Log into the Aria Operations Admin UI as the local admin user.
Click Take Offline under Cluster Status.

Note: Wait for Cluster Status to show as Offline.

Click Bring Online under Cluster Status.

Note: Wait for Cluster Status to show as Online.

Additional Information

If the Alerts and Alarms tables are large, it's recommended to truncate both tables to clear out historical alerts and alarms to avoid cluster performance issues.

The sizes can be check with this command:

su - postgres -c "/opt/vmware/vpostgres/current/bin/psql -d vcopsdb -A -t -c 'select count(*) from alert'"; su - postgres -c "/opt/vmware/vpostgres/current/bin/psql -d vcopsdb -A -t -c 'select count(*) from alarm'"
Note: A few thousand is an acceptable number.

Alternatively, the Global Settings value for Symptoms/Alerts can be lowered so the age of deleted alerts and alarms can be better controlled.

Impact/Risks:
It is recommended to take snapshots or backups before doing any database work.
See How to take a Snapshot of Aria Operations for more information.