apmservices-nass pods restarting OOM Killed
search cancel

apmservices-nass pods restarting OOM Killed

book

Article ID: 370085

calendar_today

Updated On:

Products

DX Application Performance Management

Issue/Introduction

Upgraded to 23.3 one week ago.

apmservices-nass pod description indicates terminated due to OOMKilled.

apmservices-nass-01 pod restarted after 6 hours --  events do not indicate Liveness/Readiness probe failures.

apmservices-nass-02 pod restarted after 13 hours but events indicate Liveness/Readiness probe failures.

Various other pods report errors:

ERROR c.c.a.r.nass.NassReactiveClientBase - com.ca.apm.common.api.ServicesException: 500,2102,-: No instances for partition (nass, 2), -
com.ca.apm.common.api.ServicesException: 500,2102,-: No instances for partition (nass, 2), -

Resolution

Increase pod memory limit and change node os kernel param (THP “madvise”) has helped.

With the memory set to the default the OOMKilled situations came up more frequently