Tips for Datacom Backups, Log Spills, and Forward Recovery

search cancel

Tips for Datacom Backups, Log Spills, and Forward Recovery

book

Article ID: 279845

calendar_today

Updated On:

Products

Datacom/AD

Issue/Introduction

There are many ways to accomplish business goals, and to manage an application's database environment.

This article discusses several ideas to use as a starting point in establishing a methodology for backing up application data, spilling Log File (LXX) data to Recovery Files (RXX), and for using both of these when a Forward Recovery is needed.

Let the reader understand that this is only one suggestion, and every Datacom user will still need to review and adjust the procedure according to their site practices and procedures.

Environment

z/OS

Datacom/AD

Datacom/DB

Resolution

First, this note will take a direction specific to backups and Forward Recovery. You might need to do some things a bit differently for backups that would have other purposes. The key concept to understand in this process is that a Forward Recovery always starts with loading data from a Backup file containing the internal location data for each record (we call that the URI or RecID). With that basis, I will explain the Backup and Forward Recovery processes a little bit, and mix in other comments.

BACKUP

Now, let's look at the backup. The nature of a hot or dirty backup is that it is taken while the file is open, and this means that it is not an all-encompassing, synchronized backup because updates can take place during the backup. According to the DBUTLTY Backup command description:

To perform a backup while processing continues, specify a physical sequence backup and UPDATE=NO in the command. The UPDATE=NO parameter allows the database to be updated or an update job might start during the backup....The forward and backward recovery operations require that record location information in the database exactly match the record location information that existed at the time the maintenance was performed. A reorganization of the database causes new record location information to be assigned. Therefore, any reorganization causes Recovery File(s) (RXX) created prior to the reorganization to be unusable.

To use forward recovery, you must restore the records in the database to the same locations at which they resided at the time the recovery records were logged. A backup containing the records that can be restored correctly can be obtained in one of two ways:

- Create a backup in Physical sequence that contains record location information (RECID=YES).
- Create a backup in Native Key sequence or physical sequence with RECID=NO and immediately restore it. This reorganization invalidates any Recovery Files (RXX) that were made prior to the backup. However, you can use any Recovery Files created beyond this point to restore the database after reloading this backup. (Author's Note: We will not consider this in this article as it is not practical for normal daily operation.)

If you are using JCL modeled on the CA 7 job AL2DBHOT, then you are on the right track. This JCL issues the LOCK function, which prevents data rows from moving to a new location in the physical file during the backup. It also uses the BACKUP parameters RECID=YES and SEQ=PHYSICAL to ensure the backup file has the most current data at that time and the RECID to identify the location of that row.

In terms of the frequency of the backup, I am not sure that you would see a significant amount of benefit for this CA 7 scheduling application by running a job every hour that takes 10-20 minutes to complete (assuming a very large database). On the other hand, if this is a small database that runs a backup in just a couple of minutes, the overall volume does not warrant running more frequent backups to try and save time. In this case, it would take more resources to manage lots of small backups than to run fewer backups.

There are a couple of recommendations about this backup job, and I will note them below.

Some customers would want to have both onsite and offsite backups and to use two different BACKUP functions in the same job to accomplish this. However, this plan to run two backups is not the best. Because these backups run in a linear fashion while the database updates are active, your onsite and offsite backups can never be in sync. I would think that making a copy of the onsite file for offsite use is better for two reasons:

You can reduce the amount of Datacom resources spent doing the backup.
Many archival systems, like Disk or HSM, can let you create your primary file on DASD and via processing rules, automatically create files for offsite use. You can also use a utility program like FDR or DFSMS to create the secondary copy.

FORWARD RECOVERY

Next, we can consider the Forward Recovery (FR) process. I recommend reading the documentation section called Types of Recovery to understand this further. We also have some Knowledge Base articles on this topic that many customers have found helpful. You can easily search for these to find the ones that interest you.

As noted above, the FR process always starts with a reload of the data from a Backup file. In a best practice, you would have a log spill to RXX as the first step in the backup, to minimize the number of log records to read through before starting the recovery. Running a spill too often (such as every 15 minutes) doesn't provide any benefit if your backup is run hourly since you would still have to read all of the log records from at least the start of the backup.

Because you need to restore a backup as the first part of the FR process, there is no difference here if you run the backup every half-hour or every 12 hours--each Backup and Load will take about the same time.

Therefore, the real significance is in the FR process itself, and how much data is being managed. Since machines today are capable of processing a large volume of transactions in a short time, the real question about the amount of data kept on the RXX is whether the additional time and cycles spent taking multiple RXX files are offset by a quicker recovery time. My personal opinion is that it is not. For example, you might run 3 SPILL jobs to capture 2 million RXX records, when you only need 1 million for your recovery. The difference in FR processing time might be only a minute, but the additional time and machine cycles spent creating all these extra short-term backups could have been used to run more production work.

The perceived time savings would get lost, too, in the analysis to track down which backup and which RXX files are needed, and to confirm you have the correct ones matched up. In any effort like this, the human workload is always the slowest part and the most error-prone. By having fewer backup jobs and fewer RXX files, this analysis is much easier.

PROCESS RECOMMENDATIONS

In a scenario where someone takes 6 backup files a day, it would mean running a backup every four hours. The consideration here is how long to retain the backup. If there is a requirement to keep a week's worth of backups, this would imply that processing today (Tuesday) might then need to restore your application data to some state as of last Wednesday (at 8pm, for example), and then run FR up to the next backup or some point before that. You would then have to delete all subsequent backups and RXX files because you have now set your database to a state other than what the backups would have, so they could no longer be used for FR.

Also, note that restoring the application databases to last Wednesday at 8pm does not affect other application data files; you would still need to find a way to make the other files reflect that same restore time (last Wednesday at 8pm, in our example) to roll them forward, or to set the data to some time after that.

Normally, FR is used to recover from a system or hardware failure, so the only backup file needed is the one most recent to the point of failure. As a result, having a methodology of running backups every four hours is fine, but the need for retention of 7 days' worth seems superfluous. Every application and site is different, though, and it might be possible to describe a scenario where you would need to go back several days to restore.

Since the RXX files are used in conjunction with those backups for recovery processing, running the Spill in the backup job and then as the LXX reaches a certain percentage will reduce the overall number of RXX files needed compared to running on a schedule of every 15 or 30 minutes. Our general recommendation is that you should not need more than four to six RXX files in 24 hours.

JCL RECOMMENDATIONS

Issue console command WRITE_PENDS_LOG_STABLE before the Spill, in a separate first step of the Backup. This helps make sure the RXX has the highest quality data at the point in time it is run.
Add messaging to help identify the backup Spill process in the MUF log. This will help make analysis and tracking down the correct RXX generations easier.

For example, here is what I would add to the top of the Backup job:

//SPILL   EXEC PGM=DBUTLTY,REGION=4M                                  
//STEPLIB  DD DISP=SHR,DSN=datacomhlq.CUSLIB                          
//         DD DISP=SHR,DSN=datacomhlq.CAAXLOAD or CABDLOAD                        
//SYSPRINT DD SYSOUT=*                                                
//SYSIN    DD *                                                       
 COMM OPTION=CONSOLE,                                                 
      OPTION2='DB03601 *\/----------------------------------------\/*'
 COMM OPTION=CONSOLE,                                                 
      OPTION2='DB03601 *   About to start CA7 BACKUP Process        *'
 COMM OPTION=CONSOLE,OPTION2='WRITE_PENDS_LOG_STABLE'                 
 SPILLOPT SPILL=MIN,OPTION2=FORCERXX                                  
 COMM OPTION=CONSOLE,                                                 
      OPTION2='DB03601 *   SPILL complete                           *'
 COMM OPTION=CONSOLE,                                                 
      OPTION2='DB03601 */\----------------------------------------/\*'
/*                                                                    
//RXX      DD  DISP=(NEW,CATLG,DELETE),BUFNO=100,                     
//             DSN=YOUR.APPL.SPILL.RXX(+1),             
//             UNIT=SYSDA,MGMTCLAS=IPCSDUMP,STORCLAS=IPCSDUMP,        
//             SPACE=(CYL,(200,50),RLSE)                              
//*

As we noted before, these are only suggestions, and you might have other needs or circumstances to operate your backup, recovery, and log spill processing a bit differently. There is no right or wrong way to do this, but there may be better ways to optimize the processing for your needs. If you have questions or would like help to reach your business objectives for backup and recovery, please open a support case and we would be glad to review this with you.

Additional Information

For more information about database backups, refer to the DBUTLTY BACKUP (Create Backups) documentation.

For more information about the log spill processing, refer to the DBUTLTY SPILLOPT (Transfer Data to RXX Using MAX/MIN) documentation, and the general explanation in the documentation titled Using Logging (and the following two sections, too).

For more information about recovering data from the Log Files and Recovery Files, refer to the DBUTLTY documentation for RECOVERY (Rebuild a Database), and the general discussion section Using Recovery. There are also a number of different Knowledge Base articles that discuss different areas of the process. A good one to start with is Knowledge Base article 18722, titled "Overview of Datacom Forward and Backward Recovery".

As always, please contact Broadcom support for Datacom if you have further questions.

Feedback

thumb_up Yes

thumb_down No