A Db2 member of three(3) members data sharing group encountered an abend of a detector subtask in a PTXMAN task and a dump was produced.
However the PTXMAN task did not terminate. Data collection for that db2 member stopped. This was not noticed and was not resolved until the PTXMAN task was recycled, as part of an IPL of the LPAR. Is there a message or notification that is externalized on the system log that could inform that detector data collection has stopped?
A message would allow the automation processes to recognize the stop and issue the commands to restart the collection. This applies for PDT and PSA.
Release : 20.0
Component : CA Detector for DB2 for z/OS
The Xmanager log from the CPU6 LPAR shows that on 9/4/21 at 11 PM, the Xmanager detected that Db2 subsystem DM02 had terminated:
23.00.26 S0991612 PXM0201 XMANAGER DETECTED TERM COMPLETE DB2=XXXX
This triggered PDT collection termination:
23.00.43 S0991612 PDT0102 DETECTOR TERM IN PROGRESS FOR DB2=XXXX USER=CONSOLE
The following message was then issued because the PDT AUTO(A) option was in effect:
23.00.49 S0991612 PDT0108 DETECTOR SUSPENDED FOR DB2 INIT OF DB2=XXXX
Then, the PDTDINCC task, which is the PDT collection driver task and which is responsible for restarting Detector collection when Db2 restarts, terminated abnormally with a S0C4-11 abend due to PDT problem 14850, which is resolved via PTF LU02944:
23.00.59 S0991612 IEA995I SYMPTOM DUMP OUTPUT 349
349 SYSTEM COMPLETION CODE=0C4 REASON CODE=00000011
349 TIME=23.00.49 SEQ=09242 CPU=0000 ASID=0331
349 PSW AT TIME OF ERROR 070D2000 B66D7714 ILC 6 INTC 11
349 ACTIVE MODULE ADDRESS=00000000_366D5000 OFFSET=00002714
349 NAME=PDTDINCC
When the Xmanager determined that Db2 subsystem XXXX had restarted, a PDT collection restart per the AUTO(A) parameter did not occur because the PDT collection driver task was no longer running.
The Xmanager log from the CPU3 LPAR shows that on 10/1/21, the PSA collection for XXXX abended 4 times. In three instances, the PSA collection driver restarted the PSA collection:
06.30.27 S0724176 PSA0132 SSANALZE COLLECTION ABEND RESTART IN PROGRESS DB2=XXXX
08.00.40 S0724176 PSA0132 SSANALZE COLLECTION ABEND RESTART IN PROGRESS DB2=XXXX
09.22.55 S0724176 PSA0132 SSANALZE COLLECTION ABEND RESTART IN PROGRESS DB2=XXXX
In one instance, the PSA collection driver opted to not restart the PSA collection because the PSA collection had abended less than an hour after the PSA collection driver had done a previous abend restart:
06.51.42 S0724176 PSA0134 SSANALZE COLLECTION ABEND RESTART NOT ATTEMPTED - INSUFFICIENT TIME - CONSIDER RESTARTING MANUALLY
The PDT and PSA collection driver abend restart functionality is designed in this way because if the collection is abending frequently, repeatedly restarting the collection is going to be a waste of system resources.
Here’s a link to doc. for the PDT0134 message, which parallels the PSA0134 message:
The abend restart functionality within PDT and PSA was designed to eliminate the need for an automation tool to recognize PDT/PSA collection termination and issue commands to restart the collection. And the abend restart functionality will do this successfully unless either the problem is so pervasive that the collection is abending frequently or the collection driver, which does both the AUTO(A) restart processing and the abend restart processing, has abended.
In summary, there were two distinct collection restart problems. First, PDT collection did not restart per the AUTO(A) parameter because the collection driver had crashed. Second, PSA collection did not restart per the abend restart functionality because two PSA collection abends had occurred within a short amount of time. LU02944 should prevent both the PDT collection driver abend and the PSA collection abends from reoccurring.