CA Deliver - Receiving Message "SARH001E CA Deliver work request backlog is currently at ###. This exceeds the WARNING threshold of WARN=100."
search cancel

CA Deliver - Receiving Message "SARH001E CA Deliver work request backlog is currently at ###. This exceeds the WARNING threshold of WARN=100."

book

Article ID: 201369

calendar_today

Updated On:

Products

Deliver View

Issue/Introduction

The client noticed where their Health-Check (CHECK(CA_DLVR,DLVR_PRFM_PQE@E215DEL2)) fails a few times every week in a period that corresponds with high batch activity.

The capacity/performance team confirms that the Deliver RMOSTC task receives its share of CPU when the problem occurs.

They feel that giving RMOSTC a higher service class will likely not help. 

There does not appear to be any real problem though they receive the health-check error 2-3 times per week, so the error is ignored.

Other than consider disabling the health-check, the client would like to address the situation logically and keep the health-check active.

Environment

Release : 14.0

Component : CA Deliver

Resolution

A started task or job running as a reusable address space can loop while opening a sysout data set.
This can occur when a Deliver RMOSTC task is configured to pre-process a started task's or job's execution and sysout class prior to the distribution of any report.

In the RMOH001E/SARH001E message, the WARN=100 is a hexadecimal value (x'100'), and the number of Deliver queued events is actually decimal 256.

The Deliver events are being queued, due to the 100% CPU usage.
Deliver wants to do its processing, but it is unable to as the CPU usage is 100%.

In this instance, Deliver was a victim of what is happening in the system, and is not causing the problem.

The Deliver RMOSTC tasks, as well as the View SARSTC and SARFSS tasks, should be run in the same service class as JES.

The Deliver checkpoint files (dlvr_hlq.RMODBASE.C0000001) should not be on the same disk volume with other files which could have any kind of a RESERVE placed on them at any time.

A RMOQPR01, RMOCPP05, or RMOCPP06 message indicates a delay in response from a checkpoint file.
If these messages are not being observed, then the checkpoint file response is not a problem.

It would be a good idea to try to determine what is causing the high amount of CPU, as its occurrence is causing to queue Deliver's events (as well as having possible effects on other products).