The inconsistent ETL metric times combined with no errors in the logging that align with the problem Events points to a performance/resource related problem.
The product is using an ETL pool that allows use of 5% of total allocated memory to complete the job. It may be that natural growth of the Performance Management environment has led to this being too little for the environment. As a result it's consuming all 5% of allowed memory and needs even more to perform it's job. It's not getting enough memory to have queries run faster.
While it's possible that we could increase the memory for the pool, it's not recommended to resolve this. It will only take resources away from something else resulting in other problems.
For the 80% of execution time Events, the good news is that it means that while it's approaching it's limit, it's still completing within that 1 minute limit without failure. The product raises the Event to provide a warning that it's over 80%. It does indicate it could go over 100% which then triggers failure.
For the failure related Events, it should be run again successfully. As long as the failure Events aren't showing up frequently, nothing is being lost.
In some situations 1 minute runs for this group ETL can be too frequent. In these situations we suggest backing down to a interval that runs where the Events no longer appear or appear far less frequently.
Possible Impact to increasing the interval longer than every 1 minute? Any Group based Scorecard report View may not show new Group changes synchronized for use in under 5 minutes.
We recommend starting with a change from 1 minute to 3 minutes as a first step. After setting it, review incoming Events. Does it alleviate most of the 80% of execution time Events? Maybe it resolves both those and the failure Events?
If not and the Events are less frequent but still too noisy raise the interval from 3 to 5 minutes.
How do we change the interval? Follow these instructions, which use a support lab as an example.
1. In a REST client run a GET against the URL:
http://DA:8581/rest/batch/groups/config
2. Output should look like:
<GroupsBatchConfigurationList>
<GroupsBatchConfiguration version="1.0.0">
<ID>468</ID>
<Enabled>true</Enabled>
<Interval>1</Interval>
</GroupsBatchConfiguration>
</GroupsBatchConfigurationList>
3. Set the REST client to use a PUT.
4. Set the URL to include the ID from the GET request. Sample from above info would be:
http://DA:8581/rest/batch/groups/config/4685. In the BODY of the PUT request enter the following to change Interval value from 1 to 3:
<GroupsBatchConfiguration version="1.0.0">
<Enabled>true</Enabled>
<Interval>3</Interval>
</GroupsBatchConfiguration>
6. Hit Send. If a 200 success message is received, run the GET request again to confirm the change.