DX OI doi-cpa-ng OutOfMemoryError: Java heap space | While adding capacity analytics groups it is giving 500 error
search cancel

DX OI doi-cpa-ng OutOfMemoryError: Java heap space | While adding capacity analytics groups it is giving 500 error

book

Article ID: 240146

calendar_today

Updated On:

Products

DX Operational Intelligence

Issue/Introduction

The "OutOfMemoryError: Java heap space" and 500 error on the doi-cpa-ng pod while adding capacity analytics groups. It can happen for example when trying to define Capacity analytics for all IUM groups when selected the root group for UIM. Also the following error can be found in the cpa_ng logs 

 

ERROR [2022-03-25 07:27:19,186] io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: eae49914a6a25541
! java.lang.OutOfMemoryError: Java heap space

 

 

Environment

DX Operational Intelligence  21.3.1

Capacity analytics groups for UIM

Cause

The NASS metadata clamp size was very high, so while fetching and data and processing locally it was going OOM.

Resolution

Add the NASS_METADATA_CLAMP_SIZE environment variable in the CPA-ng pod deployment, set its value to 50000, and try to load the config page to see whether you are still hitting out of memory error. 

1/ Use the following commands to enter the edit mode of your CPA-ng deployment:

kubectl get pods -n<your-namespace> | grep cpa-ng
kubectl describe pod cpa-ng-<pod_id> -n<your-namespace>
kubectl exec cpa-ng-<pod_id> -it -n<your-namespace> -- bash
kubectl edit deployment -n<your-namespace> cpa-ng

2/ Scroll down and change the NASS_METADATA_CLAMP_SIZE value to 50000. If this property is not present in your deployment YAML file yet, please add it.
 

NASS_METADATA_CLAMP_SIZE : 50000

Important Note: The "NASS_METADATA_CLAMP_SIZE" is an environment variable in the CPA-ng pod so it is not present in your YAML file if never changed before.

3/ Save the file. The pods from this deployment will be recreated automatically. Your CPA-ng should be now with the NASS metadata clamp size increased

4/ In case it will not help then please try reducing the above value.

Important Note: There are some chances that some metric names might be missing on the config page after this change. In that case, make sure that the metric is actively flowing into the system and its part of the device that is part of the selected group/service in the config page.  

If this will still not work, please open a ticket with support and attach all the logs from all your 3 CPA pods. The most important in this case would be the cpa_ng logs but all the logs from all your 3 CPA pods can be useful to find the root cause. If you are running OpenShift, please provide also the output of the oc describe deployment command of your doi-cpa-ng deployment. The commands below that can be useful in this case.

oc get pods -ndxi | grep cpa
cpa-projection-<pod-id>
doi-cpa-ng-<pod-id>
doi-cpa-service-aggregation-<pod-id>

oc describe deployment doi-cpa-ng -ndxi
oc logs doi-cpa-ng| grep drop wizards 

Additional Information

AIOPs - Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/190815/aiops-troubleshooting-common-issues-and.html

Attachments