False positive alert triggering for disk read/write latency in Aria Operations
search cancel

False positive alert triggering for disk read/write latency in Aria Operations

book

Article ID: 379065

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

The article purpose is to document the behaviour of a default threshold setting in the alerts :

"VM experiencing disk read latency"
"VM experiencing disk write latency"

The alert is triggered on a VM, however there is no complaint of slow performance by users or owner of the VM

Environment

Aria Operations 8.18

Resolution

Separate between mission critical (strategic to the business) and non mission critical applications. 

For non mission critical, the following changes will tighten the alerts conditions to filter less critical issues:

  • Increase IOPS from 10 to 100 IOPS.
    100 disk IO operations per seconds sustained for 5 minutes means 300,000 disk IO operations. If you think that is too high, reduce it to a number that matches your environment.

  • Increase CPU Usage by 4x to 1 GHz.
    This eliminates the case where latency is high but there is minimal amount of work.

  • Increase latency threshold by 2x

    • Orange: 20 → 40 ms for 5-minute average latency, and 80 ms → 160 ms for 20-second average.
    • Red: 40 → 80 ms, and 160 → 320 ms.

  • Increase the Wait cycle from 5 to 10 minutes.

The attached alert definition file implements the above.

Please follow below steps to import the attached alert definition:

1 - Configure > Alerts > Alert Definitions > Click on ... > Import 

2 - Browse to the location where you stored the attached file

3 - Click on "Overwrite existing Alert Definition"

4 - Click on Import

Additional Information

You should review with your application vendors and storage vendors recommendation on the application disk latency.

If the disk latency is much higher than best practices, yet there is no complaint from end users you might consider either one of the below options :

  1. Ignore the industry best practice, and increase the alert threshold.
  2. Proactively plan for a technology refresh. This is the preferred route if the existing hardware has aged and due for upgrade

Attachments

2 VM disk latency alerts.xml get_app