RC/Migrator: Detail on restarting a failed RCM analysis execution

book

Article ID: 191902

calendar_today

Updated On:

Products

CA RC/Migrator for DB2 for z/OS

Issue/Introduction

An execution of an analysis failed and the sysout of the execution has been lost.

How can the location of the failure be found and the job restarted?

Environment

DB2 for z/os

Resolution

This can be achieved using option 3 - Execution Display on the RC/Migrator main menu.

The Execution display shows the recorded CA Batch Processor restart records that are stored on a given DB2 subsystem.

The utility has search criteria to assist in identifying the job that failed.

BPID      ==> *
DB2 SSID  ==> *    Strat Creator ==> *        Timestamp ==> *
Status    ==> NI   Strat Name    ==> *        Type      ==> *

One or more of the above criteria can be used to identify the job to restart. At the very least an "NI" STATUS record should be found as these indicate a failed execution.
    NI   Error during execution, job terminated.

Add other known criteria such as Strategy creator and/or name.

The MESSAGE column will confirm the status.

As in this sample:

PTEDL1        ------------ Execution Display Panel ----------- yyyy/mm/dd hh:mm
COMMAND ===>                                                  SCROLL ===> CSR

BPID      ==> *
DB2 SSID  ==> *    Strat Creator ==> AUTHID1  Timestamp ==> *
Status    ==> NI   Strat Name    ==> TBSROLE  Type      ==> *
---------------------------------------------------------------------- BASLU02
                                                           --- Strategy ----
    BPID                                                   Creator  Name       St  T  Syncpoint-#    Timestamp         Message
__  AUTHID1-TBSROLE-AUTHID1-2018080703235518               AUTHID1  TBSROLE    NI  S           20    2020052901474077  CREATE INDEX AUTHID1.IIXROL2



This shows that this analysis execution failed to complete and is currently stopped at SYNC point 20 and the date/time that this happened was at timestamp 2020052901474077 (CCYYMMDDHHMMSS99).

The timestamp appended on the BPID name is the timestamp from the date/time the Analysis was produced. 

The BPID listed above is a BPID created with MANAGED OUTPUT as it is in a format that has a timestamp attached to it at the end. If the analysis output was sent to a dataset or dataset(member) there will be no timestamp appended. 

As the job ended on SYNC Point 20, it may assist the restart to see the statement that stopped the previous execution. This will be the statement following sync point 20.

The MESSAGE column above indicates what to look for as the last sync point.

There is a BROWSE(B) function that can view the Analysis:
   B  - Browse the job input that is identified in the BPID column of the log entry via ISPF's browse facility.

   This will display the analysis output if the original analysis output still exists on the system. Sometimes it may not if the analysis output had been deleted or the dataset(member)
   cannot be accessed for some reason. Managed output is stored on the PTDB database. 

SYNC point 20 is this line in the DDL:
.SYNC 20        'CREATE INDEX AUTHID1.IIXROLE2'

This means that the execution stopped at this line having committed everything before that to Db2. A sync point both creates a restart record and commits the last changes to Db2.

In the DDL the next statement after the sync 20 is a create index so something went wrong with this:

CREATE  UNIQUE INDEX AUTHID1.IIXROLE4 ON AUTHID1.TBLROLEx
        ( ROLE_ID ASC
          )
        USING STOGROUP SYSDEFLT
                                    ERASE NO
           FREEPAGE 0
           PCTFREE 10
           CLUSTER
           BUFFERPOOL BP0
           CLOSE NO
           PIECESIZE 2G
    ;

The name of the table has an x after it which is incorrect...a typo...so this needs to be corrected and the job can be resubmitted.

After making the corrections, returning to the Execution Display Panel screen using PF3 , the "S" line command is used to resubmit the job:

S  - Submit the job input that is identified in the BPID column of the log entry to the Batch Processor for execution.

    BPID                                                   Creator  Name       St  T  Syncpoint-#    Timestamp         Message
S_  AUTHID1-TBSROLE-AUTHID1-2018080703235518               AUTHID1  TBSROLE    NI  S           20    2020052901474077  CREATE INDEX AUTHID1.IIXROL2

This will display the normal "Batch Processor Interface" screen for submission of the job.

As this job had already been started, and having fixed up the incorrect DDL, the "RESTART        ===> Y" option should be used to start the job from where it left off last time. It will skip everything till after SYNC point 20 and begin at the create index that failed.

After the execution is completed the log record is updated, the status is changed to "NC" which indicates "NC   Completed successfully." The SYNC point will indicate the last sync point that was processed by the job which in this case is sync point 35. The message field will show "SYNCPOINT STATUS - NORMAL PROCESS - COMPLETE" and the timestamp will reflect the date/time that it was completed.

   BPID                                                   Creator  Name      St   T  Syncpoint-#  Timestamp         Message
__ AUTHID1-TBSROLE-AUTHID1-2018080703235518               AUTHID1  TBSROLE   NC   S           35  2020053123055136  SYNCPOINT STATUS - NORMAL PROCESS - COMPLETE

The failed job has now been completed and all work has been committed to DB2.