Alarms not clearing in DX SaaS After Metric Recovery
search cancel

Alarms not clearing in DX SaaS After Metric Recovery

book

Article ID: 437780

calendar_today

Updated On:

Products

DX SaaS

Issue/Introduction

Alarms in the DX SaaS platform remain in an "OPEN" status even after the underlying metric value has recovered and no longer meets the major or critical threshold.

  • Reviewing the Alarm Lifecycle Events shows the metric_state transitioned from "ACTIVE" to "UNKNOWN" but did not transition back to "ACTIVE" or "CLOSED" when data returned.
  • The issue may appear sporadic or tied to specific time windows.

Environment

  • DX SaaS (DXO2) 26.2.1

Cause

The root cause is a synchronization glitch within the metricalert service pods. When multiple metricalert pods are restarted (typically to remediate performance lag), a race condition or state mismatch can occur. If an alarm cycle is being processed exactly when the pods are restarting, the alarm can get stuck in an "UNKNOWN" state and fail to trigger the subsequent "CLEAR" event once the metric recovers.

Resolution

This issue is addressed via a service update in the DX SaaS production environment.

Fix Version: DXO2 SaaS 26.4.1 Deployment Schedule: Tentatively scheduled for the weekend of May 2nd, 2026.