The apmservices-nass pods keeps crashing with error code 137
search cancel

The apmservices-nass pods keeps crashing with error code 137

book

Article ID: 374048

calendar_today

Updated On:

Products

DX Operational Intelligence DX Application Performance Management CA App Experience Analytics

Issue/Introduction

We have noticed that the apmservices-nass pod has started to crash intermittently with error code 137. 

 

oc describe po <apmservices-nass-001-#######>

...

Containers:
  apmservices-nass-001:
...
    Ports:          8001/TCP, 6001/TCP, 9009/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Sun, 28 Jul 2024 17:05:34 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137

Environment

DX OI 24.x onpremise

Cause

1) Out of memory issue

2) Missed to apply some OS kernel changes required for the dx-platform databases.

Resolution

1) Fix the OutOfMemory issue
 
There are 2 options:
 
a) Add an additional instance of NASS.
 
b) Increase the memoryLimit 
Use the ClusterManager UI >  Servcies > Nass, click "Add Instance" button. Click this "..." ->" Configure deployment" and change the size to "large" if needed. This change will be saved into 01_clustermanager values file on NFS and reused in upgrades.
 
 
2)  To prevent applications from allocating more memory resources than necessary, you can enable the huge pages only inside the MADV_HUGEPAGE madvise regions instead of enabling them system-wide.
 
 
 

Additional Information