apmservices-nass pods restarting OOM Killed

search cancel

apmservices-nass pods restarting OOM Killed

book

Article ID: 370085

calendar_today

Updated On:

Products

DX Application Performance Management

Issue/Introduction

Upgraded to 23.3 one week ago.

apmservices-nass pod description indicates terminated due to OOMKilled.

apmservices-nass-01 pod restarted after 6 hours -- events do not indicate Liveness/Readiness probe failures.

apmservices-nass-02 pod restarted after 13 hours but events indicate Liveness/Readiness probe failures.

Various other pods report errors:

ERROR c.c.a.r.nass.NassReactiveClientBase - com.ca.apm.common.api.ServicesException: 500,2102,-: No instances for partition (nass, 2), -
com.ca.apm.common.api.ServicesException: 500,2102,-: No instances for partition (nass, 2), -

Resolution

Increase pod memory limit and change node os kernel param (THP “madvise”) has helped.

With the memory set to the default the OOMKilled situations came up more frequently

Feedback

thumb_up Yes

thumb_down No