We are having problems with OPMSs.
1) In the ASM monitors execution logs we can see below errors:
(-95) Checkpoint did not return status, full response
502 Bad Gateway
(-98) no more monitoring stations available to perform the check
2) From the OPMS servers, "monit summary" is reporting : redis-server "Execution failed"
restarting all services didn't help : "monit restart all"
3) There is no disk space problems (df -h) in the OPMS servers
4) The recommendations in the below KB helps but after a few days the problem reoccurs:
ASM OPMS - /var full, made disk space available but services do not start
The problem is the forced redis restart, this service is a prerequisite for API which is a prerequisite for all agents.
It can be caused by:
1) incorrect transparent huge pages setting
2) the aof file rewrite.
It is recommended to set the transparent huge pages to madvise by default.
You can check the current value by
If it was 'always' then change it to 'madvise' (as root):
echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
Please keep in mind that this change is valid till the next reboot only!
If it fixes the issue it must be made permanent.
Disable the disk synchronization. The aof file is used for data recovery after restart or reboot.
1) edit the /etc/redis.conf file and change
Check also, that all save commands are commented out, e.g.
# save 900 1
# save 300 10
# save 60 10000
# save 3600 10
2) restart redis
systemctl restart redis
3) restart api
monit restart api
4) check that timestamp of
doesn't change any more (5 minutes is enough)
5) remove the aof file
Occasionally (weekly), check the redis log file
There should be no more restarts. If redis is restarted regularly, there must be another source of problem (e.g. logrotate). However, it should not prevent redis from starting any more.