In a production setup or implementation go-live, while under load, suddenly all Access gateways stopped working together.
Concurrent connection reached about 500/per sec. Once SPS started, it runs a little while, then mod_jk.log reports error
(ajp13) Tomcat is down or refused connection. No response has been sent to the client (yet)
(ajp13) connecting to tomcat failed.
After that, only recycle access gateway will recover.
Similar issue found in community post:
https://community.broadcom.com/enterprisesoftware/browse/blogs/blogviewer?BlogKey=c6e5b222-9075-4907-af05-88de48bf670e
Release : 12.8.03
Component : SITEMINDER SECURE PROXY SERVER
Most likely cause is due to access gateway system is low on entropy, or Random Number Generator was NOT working well on the particular machine.
When running "cat /proc/sys/kernel/random/entropy_avail", it only returns result for that moment, it does not fully reflect entropy level on the system.
Federation or AuthAz service calls will need entropy for decryption and encryption.
By running "dd if=/dev/random of=/dev/null bs=1 count=$((1024*1024)) status=progress", will return read speed, optimal level should be around ~300KB/s.
When it is at ~20KB/s, system will slow down, and start to exhibit described performance problems.
Having a proper entropy gathering daemon running at Linux OS is customer's responsibility.
Customer should install rng-tools and enable/starts rngd.service at all times.
Meanwhile, some performance tuning can be done on access gateway configuration:
ajp13.accept_count=200 (default 10)
ajp13.max_threads=610 (default 410)
Increased JVM_MEM_OPTS from 1G to 2G within ~secure-proxy/proxy-engine/proxyserver.sh
Note: these configuration number are examples only, please bench test in your own environment to achieve the optimum result.
DE541317