Custom Powershell Script fails to trigger alerts when the monitored services stopped
search cancel

Custom Powershell Script fails to trigger alerts when the monitored services stopped

book

Article ID: 428011

calendar_today

Updated On:

Products

VCF Operations

Issue/Introduction

Custom PowerShell monitoring scripts running via the Telegraf agent in VMware Aria Operations (formerly vRealize Operations) collect data successfully (showing "Green" status) but fail to trigger an alert when the monitored service or application stops.

The script objects remain in a healthy state despite the service being down.

Environment

VMware Aria Operations 8.18.x

Cause

This issue is caused by three configuration factors:

  1. Inverted Script Logic: The custom script follows standard shell convention (0 = Success, 1 = Error). However, VMware Aria Operations expects availability metrics to follow the System Attributes|availability standard where 1 = Up and 0 = Down. Consequently, sending 1 when the service is down tells Aria Operations the service is "100% Available."

  2. Missing Alert Definition: Custom script metrics are treated as raw numerical data. The system does not automatically alert on a "0" value unless a Symptom Definition is explicitly created to mark Value EQ 0 as Critical.

  3. Incorrect Object Targeting: The Alert Definition may have been applied to the internal VMware Aria Operations Appliance > Custom Script object type instead of the correct Telegraf Custom Script object type used by the agent.

Resolution

To resolve this issue, you must update the script return values and configure a corresponding Alert Definition.

1. Update the PowerShell Script Logic Modify the script to return 1 for a healthy state and 0 for a stopped state.

Incorrect Logic:

PowerShell
 
if ($State -eq "running") {
 echo 0
} else {
 echo 1
}

Corrected Logic:

PowerShell
 
if ($State -eq "running") {
 echo 1  # 1 indicates UP/Available in Aria Operations
} else {
 echo 0  # 0 indicates DOWN/Unavailable
}

2. Identify the Correct Object Type

  1. Navigate to Environment > Inventory (or Object Browser).

  2. Search for one of your script objects.

  3. Note the exact Object Type listed (e.g., Telegraf Custom Script or Cloud Proxy Adapter > Custom Script).

3. Configure the Alert Definition If you are monitoring multiple services that report unique metric names (e.g., Service_A_Status, Service_B_Status) under a single agent object, create a separate alert for each service to ensure clear notification titles.

  1. Navigate to Configure > Alerts > Symptom Definitions.

  2. Add a Metric Symptom.

  3. Select the Object Type identified in Step 2.

  4. Drag the specific metric for the service (e.g., Service_A_Status) into the condition workspace.

  5. Set the condition: Metric EQ 0 with Critical severity.

  6. Navigate to Configure > Alerts > Alert Definitions.

  7. Create a new Alert named explicitly (e.g., "Critical: Payment Service Down").

  8. Add the Symptom created above.

Repeat Step 3 for each distinct service script.

Additional Information

Additional Information