ASM - Problems with OPMSs : "502 Bad Gateway" in logs and "redis-server" service in "Execution failed"
search cancel

ASM - Problems with OPMSs : "502 Bad Gateway" in logs and "redis-server" service in "Execution failed"

book

Article ID: 238143

calendar_today

Updated On:

Products

CA App Synthetic Monitor

Issue/Introduction

We are having problems with OPMSs. 

Symptoms

1) In the ASM monitors execution logs we can see below errors:

(-95) Checkpoint did not return status, full response
502 Bad Gateway
(-98) no more monitoring stations available to perform the check

 

2) From the OPMS servers, "monit summary" is reporting : redis-server "Execution failed"

restarting all services didn't help : "monit restart all" 

 

3) There is no disk space problems (df -h) in the OPMS servers

 

4) The recommendations in the below KB helps but after a few days the problem reoccurs:

ASM OPMS - /var full, made disk space available but services do not start
https://knowledge.broadcom.com/external/article/229207

 

Environment

DX ASM 

Cause

The problem is the forced redis restart, this service is a prerequisite for API which is a prerequisite for all agents.

It can be caused by:

1) incorrect transparent huge pages setting

2) the aof file rewrite.

 

Resolution

Recommendation 1: 

It is recommended to set the transparent huge pages to madvise by default.

You can check the current value by

cat /sys/kernel/mm/transparent_hugepage/enabled

If it was 'always' then change it to 'madvise' (as root):

echo madvise >/sys/kernel/mm/transparent_hugepage/enabled

Please keep in mind that this change is valid till the next reboot only!

If it fixes the issue it must be made permanent.

 

Recommendation 2:

Disable the disk synchronization.  The aof file is used for data recovery after restart or reboot.

1) edit the /etc/redis.conf file and change

appendonly yes

to

appendonly no

Check also, that all save commands are commented out, e.g.

# save 900 1
# save 300 10
# save 60 10000
# save 3600 10

2) restart redis

systemctl restart redis

3) restart api

monit restart api

4) check that timestamp of

/var/lib/redis/appendonly.aof
doesn't change any more (5 minutes is enough)

5) remove the aof file

rm /var/lib/redis/appendonly.aof

Occasionally (weekly), check the redis log file

/var/log/redis/redis-server.log
There should be no more restarts. If redis is restarted regularly, there must be another source of problem (e.g. logrotate). However, it should not prevent redis from starting any more.

 

Additional Information

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/app-synthetic-monitor/SaaS/on-premise-monitoring-stations-opms/troubleshooting.html