Datacom LOAD, MASSADD, REPLACE Performance Tips

Products

Datacom Datacom/AD Datacom/DB

Issue/Introduction

When loading a very large number of records to Datacom tables, which DBUTLTY function is fastest: LOAD, MASSADD, or REPLACE?

Are there any options that will increase the performance of these functions?

Resolution

MASSADD adds one record at a time and runs in the MUF with logging enabled, so it's not a good option when loading a large number (millions) of records. Even splitting the job to run multiple MASSADDs concurrently will cause contention as they are all updating the same area. MASSADD is best used when you need to add records to a table that is open for processing.

LOAD is faster than MASSADD as it adds records by updating data blocks in the batch address space (not in the MUF) without logging. Also by using the OPTIMIZE and SORT/SORTDFLT options, it sorts the index entries separately and performs the index update after all the data has been loaded which is more efficient.

See the documentation for Loading Multiple Tables Into a Data Area for options to load multiple tables in a single area (multi-table area). Note that LOAD can only be used for the first table in a multi-table area as it overwrites any existing data in the area. The LOAD function should be used first for the table that contains the most number of records.

For very large tables with millions of records, consider either separating the tables into their own areas and databases or partitioning large tables to split them into smaller and more easily managed tables. Multiple LOAD and other DBUTLTY functions can then be run concurrently against each partition. With partitioning, no changes are needed to the programs.

For the other tables in a multi-table area, use REPLACE rather than MASSADD. REPLACE replaces all the data in a single table without affecting any other tables. REPLACE adds records by updating data blocks in the MUF address space but does not perform logging. Index records are sorted in the DBUTLTY batch address space.

Performance recommendations for running the LOAD or REPLACE function

Pre-sort the EXTRACT file in native key sequence before doing the LOAD/REPLACE.
On the input file DD statement, specify BUFNO=150 (or up to 255) to increase the system buffers used for I/O.
Be sure to specify a large REGION size (or REGION=0M) on the EXEC statement to provide the maximum memory for the job step.
It is always best to run the job on the same LPAR as MUF.
Use a JOBCLASS and WLM setting to get the best performance from the processor.
(LOAD only) An INIT of the data area and the IXX should be performed before doing the LOAD.
Add SET OPTION1= statements to the DBUTLTY run like the example below to increase the number of buffers used:
SET OPTION1=DATANO=5000;DXXNO=2000;IXXNO=1000.
Use DBUTLTY options KBYTES= (LOAD only), OPTIMIZE=YES (LOAD only), SORT=, SORTWK=, SORTDFLT=YES, and OPTION1='(I)'.
- The formula to calculate KBYTES= is blocksize * #_of_blocks_per_track * #_of_tracks * 2 / 1024
- The formula to calculate SORT= is #_of_keys_in_table * #_of_records * 1.3
  Note: SORTDFLT=YES will calculate and override the SORT= value only when using a BACKUP input file or if the CXX currently has record counts for the area.
For more information about using SORTWK=, please refer to Knowledge Base article 144382, titled "Calculate the SORTWK space needed on Datacom DBUTLTY LOAD and RETIX"
Specify attributes DUPE-MASTER-KEY=NO and CHNG-MASTER-KEY=NO in the table definition to reduce horizontal DXX searches and reduce index processing time.
Eliminate any unnecessary alternate keys in the table definition.
Block the data area in 4K increments to save excess paging and I/O.
DATANO and DATANO2 buffer specifications should be specified in 4K increments.

Here is a sample DBUTLTY LOAD job with recommended options:

//LOAD     EXEC PGM=DBUTLTY,REGION=0M
//STEPLIB  DD DISP=SHR,DSN=DATACOM.CUSLIB
//         DD DISP=SHR,DSN=DATACOM.CABDLOAD  (CAAXLOAD for Datacom/AD)
//CXX      DD DISP=SHR,DSN=DATACOM.CXX       (optional for Datacom/AD)
//INFILE   DD DSN=input.datset.name,BUFNO=150,DISP=SHR
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(500))
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(500))
. . .
//SORTWKnn DD UNIT=SYSDA,SPACE=(CYL,(500))
//SYSPRINT DD SYSOUT=*
//SORTMSG  DD SYSOUT=*
//SYSIN    DD *
 SET OPTION1=DATANO=5000;DXXNO=2000;IXXNO=1000 
 ACCESS DBID=nnn,STATUS=OFF,USERS=FAIL
 COMM   DBID=nnn,OPTION=CLOSE
 INIT   DBID=nnn,AREA=IXX
 INIT   DBID=nnn,AREA=xxx
 LOAD   DBID=nnn,AREA=xxx,FORMAT=EXTRACT,CLUSTER=DENSE,
        DDNAME=INFILE,KBYTES=nnnn,OPTIMIZE=YES,
        SORTDFLT=YES,SORT=99999999999,SORTWK=nn,OPTION1='(I)',
        SORTDD=SORTMSG
 ACCESS DBID=nnn,STATUS=WRITE
/*

Sample DBUTLTY REPLACE job with recommended options:

//REPLACE  EXEC PGM=DBUTLTY,REGION=0M
//STEPLIB  DD DISP=SHR,DSN=DATACOM.CUSLIB
//         DD DISP=SHR,DSN=DATACOM.CABDLOAD  (CAAXLOAD for Datacom/AD)
//CXX      DD DISP=SHR,DSN=DATACOM.CXX       (optional for Datacom/AD)
//INFILE   DD DSN=input.datset.name,BUFNO=150,DISP=SHR
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(500))
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(500))
. . .
//SORTWKnn DD UNIT=SYSDA,SPACE=(CYL,(500))
//SYSPRINT DD SYSOUT=*
//SORTMSG  DD SYSOUT=*
//SYSIN    DD *
 SET OPTION1=DATANO=5000;DXXNO=2000;IXXNO=1000
 REPLACE DBID=nnn,AREA=xxx,MULTUSE=YES,DDNAME=INFILE,
         SORTDFLT=YES,SORT=99999999999,SORTWK=nn,OPTION1='(I)',
         SORTDD=SORTMSG
/*

Additional Information

As always, please contact Broadcom support for Datacom if you have further questions.