When loading a very large number of records to Datacom tables, which DBUTLTY function is fastest: LOAD, MASSADD, or REPLACE?
Are there any options that will increase the performance of these functions?
MASSADD adds one record at a time and runs in the MUF with logging enabled, so it's not a good option when loading a large number (millions) of records. Even splitting the job to run multiple MASSADDs concurrently will cause contention as they are all updating the same area. MASSADD is best used when you need to add records to a table that is open for processing.
LOAD is faster than MASSADD as it adds records by updating data blocks in the batch address space (not in the MUF) without logging. Also by using the OPTIMIZE and SORT/SORTDFLT options, it sorts the index entries separately and performs the index update after all the data has been loaded which is more efficient.
See the documentation for Loading Multiple Tables Into a Data Area for options to load multiple tables in a single area (multi-table area). Note that LOAD can only be used for the first table in a multi-table area as it overwrites any existing data in the area. The LOAD function should be used first for the table that contains the most number of records.
For very large tables with millions of records, consider either separating the tables into their own areas and databases or partitioning large tables to split them into smaller and more easily managed tables. Multiple LOAD and other DBUTLTY functions can then be run concurrently against each partition. With partitioning, no changes are needed to the programs.
For the other tables in a multi-table area, use REPLACE rather than MASSADD. REPLACE replaces all the data in a single table without affecting any other tables. REPLACE adds records by updating data blocks in the MUF address space but does not perform logging. Index records are sorted in the DBUTLTY batch address space.
Here is a sample DBUTLTY LOAD job with recommended options:
//LOAD EXEC PGM=DBUTLTY,REGION=0M
//STEPLIB DD DISP=SHR,DSN=DATACOM.CUSLIB
// DD DISP=SHR,DSN=DATACOM.CABDLOAD (CAAXLOAD for Datacom/AD)
//CXX DD DISP=SHR,DSN=DATACOM.CXX (optional for Datacom/AD)
//INFILE DD DSN=input.datset.name,BUFNO=150,DISP=SHR
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(500))
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(500))
. . .
//SORTWKnn DD UNIT=SYSDA,SPACE=(CYL,(500))
//SYSPRINT DD SYSOUT=*
//SORTMSG DD SYSOUT=*
//SYSIN DD *
SET OPTION1=DATANO=5000;DXXNO=2000;IXXNO=1000
ACCESS DBID=nnn,STATUS=OFF,USERS=FAIL
COMM DBID=nnn,OPTION=CLOSE
INIT DBID=nnn,AREA=IXX
INIT DBID=nnn,AREA=xxx
LOAD DBID=nnn,AREA=xxx,FORMAT=EXTRACT,CLUSTER=DENSE,
DDNAME=INFILE,KBYTES=nnnn,OPTIMIZE=YES,
SORTDFLT=YES,SORT=99999999999,SORTWK=nn,OPTION1='(I)',
SORTDD=SORTMSG
ACCESS DBID=nnn,STATUS=WRITE
/*
Sample DBUTLTY REPLACE job with recommended options:
//REPLACE EXEC PGM=DBUTLTY,REGION=0M
//STEPLIB DD DISP=SHR,DSN=DATACOM.CUSLIB
// DD DISP=SHR,DSN=DATACOM.CABDLOAD (CAAXLOAD for Datacom/AD)
//CXX DD DISP=SHR,DSN=DATACOM.CXX (optional for Datacom/AD)
//INFILE DD DSN=input.datset.name,BUFNO=150,DISP=SHR
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(500))
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(500))
. . .
//SORTWKnn DD UNIT=SYSDA,SPACE=(CYL,(500))
//SYSPRINT DD SYSOUT=*
//SORTMSG DD SYSOUT=*
//SYSIN DD *
SET OPTION1=DATANO=5000;DXXNO=2000;IXXNO=1000
REPLACE DBID=nnn,AREA=xxx,MULTUSE=YES,DDNAME=INFILE,
SORTDFLT=YES,SORT=99999999999,SORTWK=nn,OPTION1='(I)',
SORTDD=SORTMSG
/*
As always, please contact Broadcom support for Datacom if you have further questions.