After launching an important AE DB Reorg with no_archive_check=1, eventually the users can no longer connect via AWI with a Timeout error.
A restart of the Tomcat and JWP processes does not help to fix the issue.
In the AE Logs, we can see for the DWP processes the following kind of messages:
U00003375 Server usage of the last minute '100%', the last 10 minutes '100%' and the last hour '99%'.
That tells us that the problem is most likely related to MQ?DWP table/s being filled-up.
Fortunately, it is possible to Login into the AE system using Java UI 12.0.8 and display the Administration - Processes and utilization and Database where the same can be confirmed.
Release : 12.3
Component : AUTOMATION ENGINE
Insufficient DWP processes to cope with the amount of elements in MQDWP table.
In this case, more than 1000 records would exist in MQ1DWP table for only 2 DWP processes and since users kept trying to Login/Use AWI, it would generate more and more records in this table.
Additionally, AE DB Reorg was running at the same time that the issue was occurring causing additional extra load to the AE Database.
The issue was fixed the following way:
1. AE DB Reorg was running at the same time that WP traces had been set, which was not a good idea, so db reorg processes had to be killed.
2. Start additional WPs, that will right away become DWPs until the MQ1DWP table is empty ( on this case we started 4 more).
Prevention measures:
1. In the DWP logs we saw some suspect messages about audit records:
U00004512 Access trace: User: 'USERNAME/DEPARTMENT' Object: 'OBJECT_NAME' Access: 'X'.
This was due to the setting SECURITY_AUDIT_SUCCESS set in UC_CLIENT_SETTINGS on most your AE clients:
https://docs.automic.com/documentation/webhelp/english/AA/12.3/DOCU/12.3/Automic%20Automation%20Guides/help.htm#AWA/Variables/UC_CLIENT_SETTINGS/UC_CLIENT_SECURITY_Parameters.htm
It would need to be disabled it not necessary on all the Clients to prevent this from occurring again.
2. Launch the AE DB Reorg in periods of low workload / end-user activity followed by a rebuild /shrink of Database Tables
3. Increase the amount of WPs to be able to cope with the load (for example, add 2 or 4 more).