When loading a very large number of records to tables in a multi-table CA Datacom area, which DBUTLTY function is faster to perform the load, LOAD, MASSADD or REPLACE?
What options should be used to increase the performance of the LOAD and REPLACE functions?
MASSADD adds one record at a time and runs under the MUF with logging enabled so it's not a good option when loading a large number (millions) of records. Even splitting the job to run multiple MASSADDs concurrently will cause contention as they are all updating the same area.
A LOAD would be much faster than MASSADD as it adds records by updating data blocks in the batch address space without logging. Also with the use of the OPTIMIZE= and SORT= options, it sorts the index entries and performs the index update after all the data has been loaded which is more efficient .
DocOps section Loading Multiple Tables Into a Data Area lists options for loading multiple tables.
LOAD can only be used for the first table in the area as it overwrites any existing data in the area.
The LOAD function should be used first for the table that contains the most number of records.
For very large tables with millions of records consider either separating the tables each into their own area and database and also partitioning the large tables to split them into smaller tables. Multiple LOAD and other DBUTLTY functions can then be run concurrently against each partition. With partitioning, no changes are needed to the programs.
For the other tables in the area use REPLACE rather than MASSADD. REPLACE replaces all the data in a table without affecting other tables like LOAD does. REPLACE adds records by updating data blocks in the MUF address space but does not perform logging. Index records are sorted in the DBUTLTY batch address space.
Pre-sort the EXTRACT file in native key sequence before doing the LOAD.
Specify in the JCL on the input file DD statement the parameter BUFNO=150 (can go up to 255) to increase the system buffers.
Specify in the EXEC statement in the JCL a large REGION size or REGION=0M to give it the maximum amount of memory to the job.
It is always best to run the job on the same LPAR as MUF.
Use a JOBCLASS and WLM setting for the job to allow it to get the best performance from the processor.
An INIT of the data area and the IXX should be performed before doing the LOAD.
Add SET OPTION1= statements to the DBUTLTY run to increase the number of buffers, options DATALN=nnnn, DATANO=5000, DXXNO=2000, IXXNO=1000 and IXXLN=nnnn.
Use the options KBYTES=, OPTIMIZE=YES, SORT=, SORTWK=, SORTDFLT=YES and OPTION1=(I) in the LOAD.
The formula to calculate KBYTES= is: blocksize * #_of_blocks_per_track * #_of_tracks * 2 / 1024.
The formula to calculate SORT= is: #_of_keys_in_table * #_of_records * 1.3
Note: SORTDFLT=YES will calculate and override the SORT= value only when using a BACKUP input file or if the CXX currently has record counts for the area.
Specify attributes DUPE-MASTER-KEY=NO and CHNG-MASTER-KEY=NO in the table definition to reduce horizontal DXX searches and reduce index processing time.
Eliminate any unnecessary alternate key in the table definition.
Block the data area in 4K increments to save excess paging and I/O.
DATANO and DATANO2 buffer specifications should be done on 4K increments.
Sample DBUTLTY LOAD job with recommended options:
//LOAD EXEC PGM=DBUTLTY,REGION=0M //STEPLIB DD DISP=SHR,DSN=DATACOM.CUSLIB // DD DISP=SHR,DSN=DATACOM.CABDLOAD (or CAAXLOAD for AD) //CXX DD DISP=SHR,DSN=DATACOM.CXX //INFILE DD DSN=input.datset.name,BUFNO=150,DISP=SHR //SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(500,500)) //SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(500,500)) .. //SORTWKnn DD UNIT=SYSDA,SPACE=(CYL,(500,500)) //SYSPRINT DD SYSOUT=* //SORTMSG DD SYSOUT=* //SYSIN DD * SET OPTION1=DATALN=4096;DATANO=5000 SET OPTION1=DXXNO=2000;IXXNO=1000;IXXLN=4096
COMM OPTION=CLOSE,DBID=nnn ACCESS STATUS=OFF,DBID=nnn,USERS=FAIL INIT AREA=IXX,DBID=nnn INIT AREA=xxx,DBID=nnn LOAD AREA=xxx,DBID=nnn,FORMAT=EXTRACT,CLUSTER=DENSE, DDNAME=INFILE,KBYTES=nnnn,OPTIMIZE=YES, SORT=99999999999,SORTWK=nn,OPTION1=(I) ACCESS STATUS=WRITE,DBID=nnn /*
Pre-sort the extract file in native key sequence before doing the REPLACE.
Specify in the JCL on the input file DD statement the parameter BUFNO=150 (can go up to 255) to increase the system buffers.
Specify in the EXEC statement in the JCL a large REGION size or REGION=0M to give it the maximum amount of memory to the job.
It is always best to run the job on the same LPAR as MUF.
Use a JOBCLASS and WLM setting for the job to allow it to get the best performance from the processor.
Use the options SORT=, SORTWK= and OPTION1=(I) in the LOAD.
The formula to calculate SORT= is: #_of_keys_in_table * #_of_records * 1.3
Specify attributes DUPE-MASTER-KEY=NO and CHNG-MASTER-KEY=NO in the table definition to reduce horizontal DXX searches and reduce index processing time.
Eliminate any unnecessary alternate key in the table definition.
Block the data area in 4K increments to save excess paging and I/O.
DATANO and DATANO2 buffer specifications should be done on 4K increments.
Sample DBUTLTY REPLACE job with recommended options:
//REPLACE EXEC PGM=DBUTLTY,REGION=0MSee DocOps sections Loading Multiple Tables Into a Data Area, LOAD, MASSADD and Table Partitioning