Our multiple OPMS stations have been filling up their tmp dir with _MEI files and filling up the drive. having to stop docker and all monit process to manually delete files. Continuing to receive warnings from failed cron jobs.
Release : 10.1
The root cause is that corrupted log files are causing the docker daemon to become unresponsive and consume 100% CPU. This causes the issue with docker-compose, which in turn causes the /tmp filesystem to fill up.
Disable Log rotate cron job.
Edit the file /etc/logrotate.d/docker-container-logs and add a '#' character at the start of every line.
vi /etc/logrotate.d/docker-container-logs
#/var/lib/docker/containers/*/*.log {
# compress
# copytruncate
# daily
# rotate 5
#}
systemctl stop docker; systemctl stop docker.socket; monit stop all
file delete cmd replace _MEI* with partial file name
ls -dtr _MEI* | head -20 |xargs rm -rf
run:
find /var/lib/docker/containers/ -name '*-json.log' -exec bash -c 'jq '.' {} > /dev/null 2>&1 || rm {}' \;
systemctl start docker; monit start all
Long-term fix is upgrading OPMS to 2023.3.
_MEI* files are created by docker-compose.
If the docker daemon stops responding on the control interface for whatever reason, monit thinks that ASM agents are down and attempts to restart them. That creates a docker-compose process, which remains stuck because docker daemon is not responding. Every attempt to restart the service by monit spawns another instance of a stuck docker-compose process. And each process has it's own _MEI* folder in /tmp