OPMS stations keep filling up tmp drive space and throwing errors from cron jobs
search cancel

OPMS stations keep filling up tmp drive space and throwing errors from cron jobs

book

Article ID: 262671

calendar_today

Updated On:

Products

CA App Synthetic Monitor

Issue/Introduction

Our multiple OPMS stations have been filling up their tmp dir with _MEI files and filling up the drive.  having to stop docker and all monit process to manually delete files. Continuing to receive warnings from failed cron jobs.

 

 

Environment

Release : 10.1

Cause

The root cause is that corrupted log files are causing the docker daemon to become unresponsive and consume 100% CPU.   This causes the issue with docker-compose, which in turn causes the /tmp filesystem to fill up.

Resolution

Disable Log rotate cron  job. 

Edit the file /etc/logrotate.d/docker-container-logs and add a '#' character at the start of every line.

 

vi /etc/logrotate.d/docker-container-logs

 

#/var/lib/docker/containers/*/*.log {

#  compress

#  copytruncate

#  daily

#  rotate 5

#}

 

 

systemctl stop docker; systemctl stop docker.socket; monit stop all

 

file delete cmd replace _MEI* with partial file name

 

ls -dtr _MEI* | head -20 |xargs rm -rf

 

run:

find /var/lib/docker/containers/ -name '*-json.log' -exec bash -c 'jq '.' {} > /dev/null 2>&1 || rm {}' \;

systemctl start docker; monit start all

 

Long-term fix is upgrading OPMS to 2023.3.

Additional Information

_MEI* files are created by docker-compose.

If the docker daemon stops responding on the control interface for whatever reason, monit thinks that ASM agents are down and attempts to restart them. That creates a docker-compose process, which remains stuck because docker daemon is not responding. Every attempt to restart the service by monit spawns another instance of a stuck docker-compose process.  And each process has it's own _MEI* folder in /tmp