There was slowness in AWI, validating with service manager dialog indicates WPs services were down in both nodes, checking AE logs I can see next error:
20230107/063952.672 - U00003491 There is a time difference of '0/00:01:22' or '82' seconds to the Primary Server.
Started WPs services but are getting down after some time
Release : 21.0.4
There was time difference between the two automation engine hosts.
Since there are multiple hosts a job can be processed by any WP. The start time when processed by a WP time.
When the job end is processed by another WP running on another host that is having a older time then the Job end time becomes earlier than start time that causes the issue.
WP crashing and we can see this error "buffer overflow detected ucsrvwp terminated" in nohup file.
Enabled traces and waited for WPs to crash
We could see the following runid 28744020 as the last accessed.
This belonged to a c period object.
We could also reproduce the wp crash after restarting WPs by searching for the runid 28744020 and click on executions
Contact support to get the SQL query to fix the issue
To prevent the issue from happening again.
Both the Automation engine host should be synchronized to the same time