Aria Operations cluster remains in a "Failure" state
search cancel

Aria Operations cluster remains in a "Failure" state

book

Article ID: 440078

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • The cluster remains in a Failure state.
  • The nodes in the cluster are Waiting for Analytics.
  • One of the nodes shows the following error in vcopsConfigureRoles.log:
    YYYY-MM-DDTHH:MM:SS WARNING [3801825] - root - vcopsPlatformServices - setServiceMemory - Failed to set memory allocation for service: vpostgres-repl
    YYYY-MM-DDTHH:MM:SS ERROR [3801825] - root - vcopsConfigureRoles - setServiceMemoryAllocation - setServiceMemory failed with exit code 1: 1

Environment

Aria Operations 8.18.x

Cause

This issue is due to a metric ID overflow. These metrics are created by the Kubernetes Management Pack.

Resolution

A manual intervention with a cleanup of metric keys is needed. Engage Broadcom support to proceed fixing this issue (Creating Brocade Support Cases).

To avoid this issue, the Kubernetes Management Pack needs to be on version 2.2 or higher.