Information Technology (IT) is the backbone of most organizations with seamless, continuous processing the goal of most data centers. Business computing has become very sophisticated and at the same time complex. When all goes as it should, IT is invisible except to those of us 'in the trenches'. However, when a data center must go into disaster mode it is a well thought out plan of preparedness that allows the organization to continue seamlessly. Though there are many aspects of disaster recovery, this Knowledge Document is intended to help users of CA 7 restore batch processing in response to a disaster.
In a scheduled outage, CA 7 can be used easily for very effective recovery. Job/task scheduling and submission can be stopped to quiesce batch processingprior to system shutdown. CA 7 data can be used to ensure that processing is continued with less disruption of service.
In initial preparation for the outage, determine if key CA 7 files can handle a possibly 'backed-up' workload. Start with an estimate of when/how long the system will be down and forecast the workload for that time frame. In other words, if the system outage is estimated to be 4-6 hours, forecast that date/spanned time with the FJOB command in CA 7. Get a count of jobs forecasted adding the number of jobs that are historically already in the queue at that time. Multiply that number by 2.5 or 3 to get the number of tracks required for the Trailer queue. The Trailer queue holds JCL and requirements data for all jobs in the Request, Ready and Active queues and will require at least two tracks per job. Use the /DISPLAY,Q=ALL command to see the current allocations and use of the queue files. Also, check the allocation of the UCC7QDMP and CA7VDMP files for availability and space requirements.
CA 7 makes it easy to stop and restart batch processing in an orderly manner. The STOP,Q=ALL command can be used to halt CA 7 automatic submission of jobs. Any jobs already submitted continue to process on the operating system and go through CA 7 completion processing when finished but, no new work will be submitted to JES. You can stop schedule scan with a SSCAN,TIME=0 command. Before CA 7 is shutdown, do an SSCAN command and note the NEXT SCAN PERIOD START TIME. This is the time frame that CA 7 would use to bring jobs into the queue with a start time that would fall between that time and the calculated scan end time (see discussion later) on its next job scan. This information will be important when readying the system to resume normal processing. Shutdown CA 7 with a /SHUTDOWN,Z5 command. This does a fast shutdown of CA 7 and writes a copy of the queue files to the UCC7QDMP data set.
Be sure that you get the message:
CA-7.937 QUEUES SUCCESSFULLY UNLOADED
If you don't get this message, another message will be produced that explains the error. Look up the message(s) in the Message Reference Guide. After fixing the problem, start CA 7 with a WARM or ERST type start (don't do a MOVQ start) and then reissue the /SHUTDOWN,Z5. After a successful DMPQ shutdown, run the log dump job (program SASSHIS5) against both log files (UCC7LOGP and UCC7LOGS) as 'insurance' for a seamless recovery.
When it is time to restart CA 7, the time noted on the SSCAN output (NEXT SCAN PERIOD START TIME) is important. If the date/time of the restart is before that time, start CA 7 with a MOVQ option with schedule scan and the queues active… it's that easy. If the date/time of the restart is after that time, there are additional steps needed to get back to the initial point of stoppage.
If the restart is past the noted NEXT SCAN PERIOD START TIME, CA 7 should be starting with a MOVQ type start, with schedule scan deactivated, and the queues stopped. To deactivate schedule scan at start up, use RUNOPT=NSTA on the INIT statement in the initialization file. And to bring CA 7 up with the queues stopped, use STOPQ=YES on the SCHEDULE statement in the initialization file.
Be sure that you get the message:
CA-7.936 QUEUES SUCCESSFULLY RELOADED
If the queues do not get reloaded, a message will be produced and CA 7 will not start up.
Once CA 7 is restarted with schedule scan disabled and the queues stopped, ensure that the queue files are large enough to handle any backlog (doing forecasts to determine if the Trailer queue needs to be expanded as discussed earlier). To resume CA 7 job submission, enter a START,Q=ALL command. To resume schedule scan's automated facilities, a series of commands will need to be entered. When CA 7 is started with a COLD type of start up (MOVQ is aCOLD type of start up), schedule scan is set to current time. To restart schedule scan, the NEXT SCAN PERIOD START TIME must first be set with a SSCAN,DATE=yyddd,PERSTART=hhmm command to set the NEXT SCAN PERIOD START TIME to the date/time noted in the SSCAN output from before the shutdown. If you do not have the SSCAN data from the prior shutdown, you can run a report from the previously dumped log data, looking for the last SCN0-11 message that was produced before the shutdown. The SCN0-11 message has the last schedule scan settings and from them you can extrapolate the times for the next scan. After the NEXT SCAN PERIOD START TIME has been set, issuing SSCAN,SCAN=SCH wakes schedule scan immediately and looks for jobs with a start time that falls between the NEXT SCAN PERIOD START TIME and a calculated scan end time (the calculation for scan end time is current time (time schedule scan started) plus SPAN plus QDWELL). If this time frame would bring in too many jobs, you can set schedule scan to work with 'intervals' by adding a PEREND keyword to the SSCAN command as follows:
The scan end time is then set to the value entered on PEREND. A word of caution--if you are using the PEREND parameter, be sure that you do not have PERFORM=(…5) option on the INIT statement in the initialization file (see the Note below). Once the begin and end times for schedule scan have been set, do a SSCAN, SCAN=SCH command. This wakes-up schedule scan for one scan only. Automatic wake-up is turned off until a scan is done without the PEREND parameter and if all goes well, normal processing is resumed.
Note: If using schedule scan to bring in intervals of work, and you are rescanning an already scanned time frame after the first wakeup, be sure that the PERFORM option of 5 on the INIT statement in the initialization file is not used. In other words, when rescanning an already scanned time frame, CA 7 will do duplicate checking on the first scan that occurs after a COLD type start. However, if rescanning occurs on subsequent schedule scans, duplicate work can be brought in if using PERFORM=5 on the INIT statement.
An unscheduled outage is the dread of all IT personnel. Though there are many other processes that must be taken into account, getting CA 7 back up and submitting batch workload is a high priority. If the outage is not DASD (queue) related, you may want to have a dormant copy of CA 7 (TYPE=DORM) automatically start on another LPAR in the sysplex. If the outage is such that you do not have queue files, the CA 7 log files are key to recovering under most circumstances. If possible, run the log dump job against both primary and secondary log files to ensure the historical data is as up-to-date as possible. When it is time for CA 7 to resume processing, start CA 7 with a TYPE=FORM and the initialization file setting disabling schedule scan and stopping the queues (see earlier discussion). This is a COLD type start, which also formats the queue files (all queue data is lost).
With the history file created from the log file dump job, data is available for input to the CA 7 Recovery Aids program (see the Report Reference Guide). To run the Recovery Aids program, execute SASSHIS8 with a 50 control card. This program produces reports and a batch file. The reports include the output from an LQ command from the point of failure. The batch file contains DEMAND(H) commands for the jobs that were in the Request, Ready and Active queues at that time. After the restart of CA 7, run a batch terminal interface job with the commands produced by the Recovery Aids program. Jobs that were in CA 7 queues at the time of failure are DEMANDed into the Request queue and are ready for processing. Jobs that were in the Active queue or Ready queue and had already been submitted have DEMAND commands created with a TYPE=RES so that the jobs can either be restarted or force completed.
From this point, the course of action is determined by whether normal processing is to be resumed or if only a subset of the workload will run. If you want to resume normal processing follow the directions for setting and turning on schedule scan from the Schedule Outage as described above. If only a subset of the normal workload will be run, the queues can be started, workload forecasted and then DEMANDed into the queues; or you can issue a HOLD,Q=ALL command to put a HOLD on all work in the queues, and issue SSCAN,SCAN=HLD so that all work that enters the queue subsequently will have a hold requirement, set/turn on schedule scan and then release or cancel jobs in the queues as needed.
CA 7 is a powerful tool in the recovery of IT processing after a disaster. To recap: