PAM_DISABLE_MULTI_THREAD patch - when to apply and how

Products

CA Privileged Access Manager (PAM)

Issue/Introduction

A multi-threaded feature was introduced in PAM 3.3.1 to increase performance on scheduled jobs rotating passwords at regular intervals. Prior to the 3.3.1 release one PAM node in the primary site would run a scheduled job and update target accounts one by one. With the enhancement, all primary site nodes that are NOT in maintenance mode will execute update tasks within a scheduled job, and each node will have many threads executing the tasks in parallel. The number of threads depends on the size of the RAM available to the PAM server. For 16GB of RAM it's about 100, for 64 GB it's over 300.

The enhancement has the potential to dramatically reduce the duration of a scheduled job updating a large number of target accounts. It works best if there is one account per target server to update in a given job. An example would be a job that updates the root account on thousands of different Linux or UNIX servers every X days. However it can cause problems in the following scenarios:

(1) An account (service account) that is configured to update the password of other accounts (managed accounts) is updated in the same scheduled job as the managed accounts. This is a use case that the PAM administrator should avoid by scheduling updates of service accounts and managed accounts in separate jobs running at different times, and that made sense even before the introduction of the multi-threaded job execution.

(2) The job updates many target accounts in the same credential source, and the credential source cannot handle the large number of parallel updates. This includes the following cases:

(a) The credential source allows only a limited number of incoming connections. E.g. an Active Directory domain controller might have such a limit imposed. This can cause a random subset of the account updates in the scheduled job to fail.

(b) A network appliance starts dropping connections when it detects what looks like a Denial of Service (DoS) attack on the credential source. Keep in mind that each password update involves multiple connections into the credential source. Update of an LDAP/AD account includes about half a dozen separate connections to a domain controller.

(c) The credential source defined in PAM is a load balancer, and a connection from PAM to the credential source may end on any of multiple servers behind the load balancer, such as a list of domain controllers for Active Directory implementations. For a single update, the multiple connections mentioned in (b) are likely to go to the same domain controller. But if there are many updates in parallel, load balancing is very likely to kick in, leading to cases where a service account may update the password of a managed account on domain controller A, but the subsequent verification of the new password is done on controller B, which does not have the new password yet and fails the update. This is a worst case scenario in that PAM will retain the old password, the new password in Active Directory is lost and the account is broken.

(d) The credential source, or the external component communicating with the credential source, cannot process the updates as fast as PAM submits them. An example was PAM's own Windows Proxy target connector. Up to the 4.0.1 release the Windows Proxy had a lock that serialized connections to remote Windows hosts to avoid conflicts in the Windows APIs used by the Proxy. When many jobs for remote local Windows accounts were submitted to the Proxy in parallel, they had to wait for the lock, and some of them could take too long to complete within the maximum time of 300 seconds that PAM allows for password update tasks. This also could lead to the problem where the password in fact is updated on the target device, but PAM retains the old password and the new password is unknown. The Account Passwords Update Attempts report for the scheduled job would show account updates failing after a duration of 300 seconds. This problem is resolved in PAM 4.0.2+ and 4.1+.

Resolution

If you find that many target accounts go out of sync when updated by a scheduled job, but can be updated w/o a problem one at a time by a PAM administrator, it is rather likely that the problem is with the parallel processing of the updates. If the speed of job execution is a major concern for you, and you cannot accept going back to sequential updates, you will have to work with PAM Support to find the best solution for your environment. If you do not see a problem with having accounts updated one by one in your scheduled jobs, we recommend that you apply the PAM_DISABLE_MULTI_THREAD patch available from the PAM Solutions and Patches page to your PAM servers.

This patch can be applied to nodes in an active cluster, but requires a reboot and therefore would have to be applied to one node at a time. The recommended sequence would be to patch nodes that are NOT site leaders (first node in a site) first, and the site leaders last. Always update only one node in an active cluster, and wait for it to come back and be in sync again before moving on to the next. Since only primary site nodes run password update tasks, start patching primary site nodes. If you are able to stop the cluster during a maintenance window, you can do so and patch all nodes in parallel before starting the cluster again.

Note that once the patch is applied, there will be a single thread processing password update tasks within scheduled jobs on each primary site node, similar to how it worked in releases prior to 3.3.1. However, there still is a difference in that all primary site nodes that are not in maintenance mode execute tasks. If you have multiple primary site nodes out of maintenance, there still will be some degree of parallelism in job execution.