Storage path redundancy alerts do not clear automatically in Aria Operations
search cancel

Storage path redundancy alerts do not clear automatically in Aria Operations

book

Article ID: 435822

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • In VMware Aria Operations, "Path redundancy to storage device degraded" alerts are triggered during storage path disruptions but fail to clear automatically once the paths are restored.
  • Aria Operations reports the alert: "Path redundancy to storage device degraded".

  • Alerts remain in an Active state even after the underlying ESXi host or storage connection is successfully restored.

  • Manual cancellation is required to clear the alert from the Alerts Workbench.

  • The following corresponding entries are visible in the ESXi host's /var/run/log/vobd.log file:
    [vob.scsi.scsipath.pathstate.deadver2] scsiPath vmhba1:##:##:##changed state from on (device ID: naa.###################)
    [esx.problem.storage.redundancy.degraded] Path redundancy to storage device naa.#################degraded. Path vmhba1:##:##:##is down. Affected datastores: "Datastore_Name"

Cause

This issue happens because Aria Operations relies on specific failure events from vCenter. In setups using Round Robin multipathing, a lost storage connection might reconnect using a completely new path. Since the original failed path was never technically 'fixed,' vCenter doesn't send a 'cleared' signal for it. As a result, Aria Operations keeps the alert active indefinitely.

Resolution

This is as per design

As a workaround proceed with the steps below

Modify Alert Definition Logic 

Adjust the Alert Definition so it evaluates the overall health of the storage device rather than individual, transient path down events.

  1. Log in to the Aria Operations user interface.

  2. Navigate to Configure > Alerts > Alert Definitions.

  3. Edit the alert: Path redundancy to storage device degraded.

  4. In the Symptoms section, switch the operator from ANY to ALL for path-down symptoms

    • Host has no redundancy to storage device

    • A path to storage device went down

  5. Save the Alert Definition.

Add Metric-Based Thresholds

For stricter reliability, augment the alert by using metric-based thresholds instead of relying purely on event-based logs.

  1. While still editing the Alert Definition, add a new symptom based on the following metric:

    • Object Type: Host System

    • Metric: Storage | Number of Active Paths

  2. Set a numerical threshold appropriate for your specific storage array configuration.

    • Example: Set the threshold severity to Critical if the value is < 2.

  3. Save changes.

Step 3: Perform Manual Cleanup

  1. Go to Alerts > Alerts Workbench.
  2. Filter for the stale alerts and click Cancel Alert. This establishes a new baseline for the updated logic

Additional Information

Submit a VMware by Broadcom feature request