Symptoms:
- New alarms don't appear in OI console OR existing alarms are not being updated as expected
- No errors in the connector log
- No changes have been made to the existing configuration
DX Operational Intelligence 1.3.x, 20.x
DX Application Performance Management 11.x, 20.x
"Normalized Alarm Service" not working as expected
CHECKLIST
STEP #1 : Jarvis (kafka, zookeeper, elasticSearch)
STEP #2 : Check Kafka consumer groups : find out if there is any problem or lag processing the messages
STEP #3 : Verify Normalized_Alarm service.
STEP #1 : Jarvis (kafka, zookeeper, elasticSearch)
DX AIOps - Jarvis (kafka, zookeeper, elasticSearch) Troubleshooting
https://knowledge.broadcom.com/external/article/189119
STEP #2 : Check below "Kafka consumer groups" : find out if there is any problem or lag processing the messages
NAS_APP_ID_NormalizedAlarmService_INGEST
jarvis_indexer
NAS_APP_ID_NormalizedAlarmService_UPDATE
1) List all the consumer groups:
If OI 20.2:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --list
If OI 1.3.x:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server kafka:9092,kafka-2:9092,kafka-3:9092 --list
Result:
incidentmanagement_70566566
jaf_java_policy_manager_1257448589
DOIReadserver_98121ca1-2aa5-4230-9324-e4253c04683d
internal_2cab33f6-1f04-4538-9cff-9b2261422ad8
jaf_java_notify_filter_1986092059
internal_f44d9901-ef7f-43b0-9a2a-b371a26848b0
NAS_APP_ID_NormalizedAlarmService_INGEST
Api_jarvis_tenant
Api_jarvis_docType
internal_ef353112-c548-449e-9da7-41e374aecfb9
jarvis_indexer
internal_de509bdf-4323-4f2f-87fa-3a9a3e2f2c32
SACICorrelation
Api_jarvis_product
Incident_face8aa0-8f29-4c1c-a26f-3074d273dd28
Api_jarvis_docView
Api_jarvis_productTenant
axa.transformer
NAS_APP_ID_NormalizedAlarmService_UPDATE
axa_log_gateway-muntest000478_hub
SAServiceAlarm
Indexer_static
NOTE : if when running the above commands you get the following error:
then you need to unset the JMX_PORT environment variable i.e run the command unset JMX_PORT
2) Check consumer group:jarvis indexer
If OI 20.2:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group indexer
If OI 1.3.x:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server kafka:9092,kafka-2:9092,kafka-3:9092 --describe --group jarvis_indexer
3) Check consumer group: NAS_APP_ID_NormalizedAlarmService_UPDATE
If 20.2:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group NAS_APP_ID_NormalizedAlarmService_UPDATE
If OI 1.3.x:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server kafka:9092,kafka-2:9092,kafka-3:9092 --describe --group NAS_APP_ID_NormalizedAlarmService_UPDATE
4) Check consumer group: NAS_APP_ID_NormalizedAlarmService_INGEST
If OI 20.2:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group NAS_APP_ID_NormalizedAlarmService_INGEST
If OI 1.3.x:
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server kafka:9092,kafka-2:9092,kafka-3:9092 --describe --group NAS_APP_ID_NormalizedAlarmService_INGEST
NOTE: below warning message is an indication of an issue with the Normalized Alarm Service:
If you get Warning: Consumer group “NAS_APP_ID_NormalizedAlarmService_INGEST” is rebalancing
1) Open the normalized_alarm.log:
Go to Openshift console | Application | pods | < normalized-alarm pod> | Terminal
Otherwise, ssh the pod using:
kubectl exec -ti <normalized alarm pod> sh
If OI 20.2 :
cd /opt/caemm/normalized-alarm/logs/<doi-normalized-alarm-pod>
If OI 1.3.x :
cd /opt/caemm/normalized-alarm/logs
2) Search for ERROR or exception
3) Check if Alarm normalization is working as expected
Below screenshots illustrate a use-case where NEW alarms are not being processed: "itoa_alarms_uim count" has been the same for the past 2 days
..
if we filter the log:
more normalized_alarm.log | grep "itoa_alarms_uim"
Recommendation:
Restart normalized-alarm pod
Go to Openshift console | Application | pods | < normalized-alarm pod>
Click "Actions" > Delete
DX AIOPs - Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/190815