DX OI - No New alarms or Alarms are not getting updated

book

Article ID: 189463

calendar_today

Updated On:

Products

DX Operational Intelligence

Issue/Introduction

Symptoms:

- New alarms don't appear in OI console OR existing alarms are not being updated as expected
- No errors in the connector log
- No changes have been made to the existing configuration

Cause

"Normalized Alarm Service" not working as expected

Environment

DX Operational Intelligence 1.3.x, 20.x
DX Application Performance Management 11.x, 20.x

 

Resolution

CHECKLIST

STEP #1 : Jarvis (kafka, zookeeper, elasticSearch)
STEP #2 : Check Kafka consumer groups : find out if there is any problem or lag processing the messages
STEP #3 : Verify Normalized_Alarm service.


STEP #1 : Jarvis (kafka, zookeeper, elasticSearch)

DX AIOps - Jarvis (kafka, zookeeper, elasticSearch) Troubleshooting
https://knowledge.broadcom.com/external/article/189119


STEP #2 : Check below "Kafka consumer groups" : find out if there is any problem or lag processing the messages

NAS_APP_ID_NormalizedAlarmService_INGEST
jarvis_indexer
NAS_APP_ID_NormalizedAlarmService_UPDATE

1) List all the consumer groups:

If OI 20.2:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --list

If OI 1.3.x:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server kafka:9092,kafka-2:9092,kafka-3:9092 --list

Result:

incidentmanagement_70566566
jaf_java_policy_manager_1257448589
DOIReadserver_98121ca1-2aa5-4230-9324-e4253c04683d
internal_2cab33f6-1f04-4538-9cff-9b2261422ad8
jaf_java_notify_filter_1986092059
internal_f44d9901-ef7f-43b0-9a2a-b371a26848b0
NAS_APP_ID_NormalizedAlarmService_INGEST
Api_jarvis_tenant
Api_jarvis_docType
internal_ef353112-c548-449e-9da7-41e374aecfb9
jarvis_indexer
internal_de509bdf-4323-4f2f-87fa-3a9a3e2f2c32
SACICorrelation
Api_jarvis_product
Incident_face8aa0-8f29-4c1c-a26f-3074d273dd28
Api_jarvis_docView
Api_jarvis_productTenant
axa.transformer
NAS_APP_ID_NormalizedAlarmService_UPDATE
axa_log_gateway-muntest000478_hub
SAServiceAlarm
Indexer_static


NOTE :
 if when running the above commands you get the following error:

then you need to unset the JMX_PORT environment variable i.e run the command  unset JMX_PORT

2) Check consumer group:jarvis indexer

If OI 20.2:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group indexer 

If OI 1.3.x:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server kafka:9092,kafka-2:9092,kafka-3:9092 --describe --group jarvis_indexer 

3) Check consumer group: NAS_APP_ID_NormalizedAlarmService_UPDATE

If 20.2:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group NAS_APP_ID_NormalizedAlarmService_UPDATE

If OI 1.3.x:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server kafka:9092,kafka-2:9092,kafka-3:9092 --describe --group NAS_APP_ID_NormalizedAlarmService_UPDATE

 

4) Check consumer group: NAS_APP_ID_NormalizedAlarmService_INGEST

If OI 20.2:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group NAS_APP_ID_NormalizedAlarmService_INGEST

If OI 1.3.x:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server kafka:9092,kafka-2:9092,kafka-3:9092 --describe --group NAS_APP_ID_NormalizedAlarmService_INGEST


NOTE
: below warning message is an indication of an issue with the Normalized Alarm Service:

If you get Warning: Consumer group “NAS_APP_ID_NormalizedAlarmService_INGEST” is rebalancing



 
STEP #3 : Verify Normalized_Alarm service

1) Open the normalized_alarm.log:

Go to Openshift console | Application | pods | < normalized-alarm pod> | Terminal
Otherwise, ssh the pod using:

kubectl exec -ti <normalized alarm pod> sh

If OI 20.2 :
cd /opt/caemm/normalized-alarm/logs/<doi-normalized-alarm-pod>

If OI 1.3.x :
cd /opt/caemm/normalized-alarm/logs

2) Search for ERROR or exception

3) Check if Alarm normalization is working as expected

Below screenshots illustrate a use-case where NEW alarms are not being processed: "itoa_alarms_uim count" has been the same for the past 2 days


..
if we filter the log:

more normalized_alarm.log | grep "itoa_alarms_uim"

Recommendation:

Restart normalized-alarm pod

Go to Openshift console | Application | pods | < normalized-alarm pod>

Click "Actions" > Delete

Additional Information

DX AIOPs - Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/190815

Attachments