Calculate the SORTWK space needed on CA Datacom DBUTLTY LOAD and RETIX

Products

Datacom Datacom/DB Datacom/AD

Issue/Introduction

The customer is using DBUTLTY to LOAD very large tables with multiple keys. One table will have more than 980 million key entries, and the other will have more than 110 million entries.

Without the correct details, we might receive a S0C1 abend, and would see error messages from the SORT utility (IEC046A) and from DBUTLTY(DB02405I and DB02406I):

ICE046A 0 jobstepinfo  SORT CAPACITY EXCEEDED - RECORD COUNT #########
DB02405I - PXX START jobname ##### 13(062) ADD##-123-    D-1 DBUTLTY RETIX
DB02406I - PXX END - SORT IN LOAD/RETIX HAD AN ERROR

Is there a way to determine how much SORTWK space is needed to sort all these entries?

The JCL used was:

//DBUTIL  EXEC PGM=DBUTLTY,REGION=0M
//STEPLIB  DD  DISP=SHR,DSN=CA Datacom Loadlibs
//         DD  . . .
//CXX      DD  DISP=SHR,DSN=CXX for this MUF
//PXX      DD  DISP=SHR,DSN=FXX for this MUF
//LXX      DD  DISP=SHR,DSN=LXX for this MUF
//FXX      DD  DISP=SHR,DSN=PXX for this MUF
//SYSPRINT DD  SYSOUT=*
//SNAPER   DD  SYSOUT=*
//SYSUDUMP DD  SYSOUT=*
//BACKUPFL DD  DSN=The database Backup file,DISP=(OLD),
//             UNIT=(TAPE,,DEFER),LABEL=(1,SL),
//             VOL=(PRIVATE,RETAIN,,45,SER=123456),
//             DCB=(TRTCH=COMP)
//SORTWK01 DD  UNIT=(SYSDA,10),SPACE=(CYL,(06000,0500)),DSNTYPE=LARGE
//SORTWK02 DD  UNIT=(SYSDA,10),SPACE=(CYL,(06000,0500)),DSNTYPE=LARGE
   . . . continued down to
//SORTWK16 DD  UNIT=(SYSDA,10),SPACE=(CYL,(06000,0500)),DSNTYPE=LARGE
//SYSIN    DD  *
LOAD   DBID=111,AREA=ABC,FORMAT=BACKUP,DDNAME=BACKUPFL,
       LOADDUPS=YES,
       SORT=99999999999,SORTWK=16,OPTIMIZE=YES,OPTION1='(I)',KBYTES=999
/*

Environment

Component : CA DATACOM/DB
Component : CA DATACOM/AD

Resolution

There are a few things here that you can do to improve your sorting, and some things that should be corrected in your JCL.

First, it is better to specify SORT=1 and SORTDFLT=YES to allow DBUTLTY to create the best estimate of records for the sort. Note that as mentioned in the documentation, "When set as SORTDFLT=YES, the LOAD looks for record counts in the control records of the BACKUP input or the current CXX to use for the SORT= override. If the count in both places is zero (no records) for all tables to be loaded, the SORT= value is used and not overwritten." By specifying the 11 nines, you are telling SORT that there will be 99,999,999,999 records, and Sort will try to allocate the necessary work space to handle that many records. Based on the description of 980M keys (which is only 9 nines, or 1% of what you specified),the total size of the sort file is about 60GB, and as Sort segments the file and then reassembles it, this might not be too large if you have sufficient memory, since z/OS allows 64-bit addressing of memory up to 16 EB (exabytes).

Next, you specified multiple volumes for each of your SORTWK areas in the JCL. DFSORT will only use the first volume of any multi-volume specification, so having SUNIT=(SYSDA,10) does not accomplish anything. Also the issue of primary and secondary allocation is somewhat moot, since you will only use secondary allocation (if the WRKSEC parameter is turned on) for the first 12 SORTWK files. Therefore, there is no benefit to having (primary,secondary) space on SORTWK13 and above. This makes it confusing for others who review the JCL, and so the recommendation would be to drop secondary for all SORTWK files, if you use them at all.

Now, how would you optimize this sort?

The first thing I would do is run the LOAD with your step region size at the maximum. This does not necessarily mean REGION=0M/0K/0G, because your site could intercept this in the IEFUSI exit and reduce it. I suggest you contact your systems team to find out how to specify maximum memory size for your step. Along with this, specify MEMLIMIT=NOLIMIT on the step.

The next thing I would do is try to let SORT determine the best use of work space. Sort will try to use memory or Hiperspace/data space if possible, and can handle a very large amount of records there. Also, memory is much faster than SORTWK files, because every time the SORTWK file on disk is accessed, Sort starts a separate I/O and goes to sleep while the I/O is completed, and then it will be dispatched again. This takes a significant number of cycles that are not needed with in-memory sorts. This means a reduction in work space allocated to your job (that is now available to other jobs), reduction in I/O, reduction in EXCP and their associated waits, and consequently, a reduction in run time.

As a matter of fact, IBM have often advised not to use SORTWK files unless absolutely necessary. In your case, if you have decent memory available to your step, you might not need SORTWK DD statements at all. This differing use of memory is why it is not possible to calculate how much SORTWK is needed - this is not a deficiency in the CA Datacom tools, it is related to how much storage is available to each user at a given point in time.

According to IBM, the amount of intermediate storage required can vary depending on many factors including:

The amount of Hiperspace DFSORT is able to use at the time the sort is run
The amount of main storage available
The degree of randomness of the records to be sorted
The values specified (or defaulted) for options such as DYNALOC, DYNAUTO, DYNSPC, FILSZ/SIZE, AVGRLEN, or DYNALLOC.

Because your Sort defaults specify HIPRMAX=OPTIMAL,DSPSIZE=MAX, and MOSIZE=MAX, you are configured to make the most use of memory for your sorts. Sort will dynamically allocate multiple memory objects as needed, better managing general storage for all Sort tasks running concurrently.

Now, if you want to use SORTWK files, Sort will normally allocate these dynamically, based on the installation parms. Please ask your DFSORT administrator to tell you what the setting is for DYNAUTO, or run this job to list the installation defaults and look for DYNAUTO:

//LISTDEF  EXEC PGM=ICETOOL
//TOOLMSG  DD SYSOUT=*
//DFSMSG   DD SYSOUT=*
//SHOWDEF  DD SYSOUT=*
//TOOLIN   DD *
DEFAULTS LIST(SHOWDEF)
/*

If DYNAUTO=IGNWKDD, dynamic allocation takes precedence over JCL allocation. If you want the opposite result for selected applications, use the USEWKDD run-time option.
If DYNAUTO=YES, JCL allocation takes precedence over dynamic allocation. If you want the opposite result, remove all JCL allocation statements.
If DYNAUTO=NO, dynamic allocation of work data sets is not used unless you specify the DYNALLOC run-time option. JCL allocation takes precedence over dynamic allocation.

To wrap all this up, I would recommend:

Specify SORT=1,SORTDFLT=YES on the LOAD statement to allow DBUTLTY to specify the most accurate file size (version 15.0 and above);
Remove the SORTWK DD statements, and leave the SORTWK=16 (or any number up to 99) on the LOAD if you wish, to allow SORT to dynamically allocate what it needs;
Change the REGION= setting on the step to what your systems team recommends, and add ,MEMLIMIT=NOLIMIT;
Add a //SORTDIAG DD SYSOUT=* to get better tuning diagnostics from SORT;
Add ,BUFNO=150 - 250 to your BACKUPFL DD statement, to reduce the time to process the input file;
Change KBYTES to a value to support 1 cylinder of data buffers. Following the formula in the LOAD documentation, if you have a blksize of 4K, you would use 1440 for one cylinder;
Add a //SORTSTAT DD SYSOUT=* and LOAD parameter SORTDD=SORTSTAT to capture the output messages from SORT in its own file.

Here is how I would code your step, based on the above recommendations - changes are highlighted:

//DBUTIL  EXEC PGM=DBUTLTY,REGION=0M,MEMLIMIT=NOLIMIT
//STEPLIB  DD  DISP=SHR,DSN=CA Datacom Loadlibs
//         DD  . . .
//CXX      DD  DISP=SHR,DSN=CXX for this MUF
//PXX      DD  DISP=SHR,DSN=FXX for this MUF
//LXX      DD  DISP=SHR,DSN=LXX for this MUF
//FXX      DD  DISP=SHR,DSN=PXX for this MUF
//SYSPRINT DD  SYSOUT=*
//SNAPER   DD  SYSOUT=*
//SYSUDUMP DD  SYSOUT=*
//BACKUPFL DD  DSN=The database Backup file,DISP=(OLD),
//             UNIT=(TAPE,,DEFER),LABEL=(1,SL),
//             VOL=(PRIVATE,RETAIN,,45,SER=123456),
//             DCB=(TRTCH=COMP),BUFNO=200
//SYSIN    DD  *
LOAD   DBID=111,AREA=ABC,FORMAT=BACKUP,DDNAME=BACKUPFL,
       LOADDUPS=YES,SORT=1,SORTDFLT=YES,SORTWK=20,SORTDD=SORTSTAT,
       OPTIMIZE=YES,OPTION1='(I)',KBYTES=1440
/*
//SORTDIAG DD  SYSOUT=*
//SORTSTAT DD  SYSOUT=*

As an example, I have a DB with 2 single-table areas, and there are 99M + 62M rows, with 1 key each. I specified SORTDFLT=YES, no SORTWK parameter and no SORTWK files. Sort processed 160,493,710 entries of 36 bytes, and its DYNALLOC processing allocated 62190 tracks, after using 5526M memory for sortwork space. The job ran for 57 minutes using VTAPE for the backup. SORT defaulted to DYNALLOC=(SYSDA,03).

Additional Information

For more information on tuning your DFSORT processing, please refer to the z/OS DFSORT Tuning Guide (link for V2R4).

As always, please contact Broadcom support for CA Datacom if you have further questions.