We are experiencing communication failure messages on one PAM appliance, and it seems to be popping up when we log into that PAM node. My best guess is it's not loading the UI fast enough and is timing out, hence the communication message. When we try to open the session logs to see if there's any indication as to what is happening that is timing out as well and throwing the communication failure message. We are unable to get sessions logs to open at all. Interestingly, the CPU and RAM usage seem alright. No idea why the appliance is so slow.
Release : 3.4
Component : PRIVILEGED ACCESS MANAGEMENT
PAM has a service running, named "logwatch", to cap session log messages at 250k, even if no session log purge is configured, see KB 46273. This service somehow got hung and failed to remove any session logs for several months. Eventually the session log table got so large (>3M entries) that queries reading data from it took very long, and the PAM client timed out before the data was available. As of Jan 2022 this was an isolated incident and by the time the problem was observed, logs had been overwritten and there was not enough information available to understand root cause of the process hang.
A reboot of the node should resolve this problem. The "logwatch" service will delete 4000 messages every five minutes, as long as the number of session log entries for the local PAM node exceeds 250000. If you have option "Require Email..." checked on the Configuration > Logs > Automatic Log Purge page, you should see emails coming in every 5 minutes until the number of log messages gets below 250000. It may take a while before the number of messages is down to where the UI stops timing out.
An alternative is to open a case with PAM Support and have a Support engineer access the PAM appliance via SSH to restart the service, if a reboot would interrupt PAM users.