CA Datacom Performance LOAD, MASSADD, REPLACE

book

Article ID: 127157

calendar_today

Updated On:

Products

CA Datacom - DB CA Datacom CA Datacom - AD CA Datacom - Server CA CIS CA Common Services for z/OS CA 90s Services CA Database Management Solutions for DB2 for z/OS CA Common Product Services Component CA Common Services CA Datacom/AD CA ecoMeter Server Component FOC CA Easytrieve Report Generator for Common Services CA Infocai Maintenance CA IPC Unicenter CA-JCLCheck Common Component CA Mainframe VM Product Manager CA Chorus Software Manager CA On Demand Portal CA Service Desk Manager - Unified Self Service CA PAM Client for Linux for zSeries CA Mainframe Connector for Linux on System z CA Graphical Management Interface CA Web Administrator for Top Secret CA CA- Xpertware

Issue/Introduction



When loading a very large number of records to tables in a multi-table CA Datacom area, which DBUTLTY function is faster to perform the load, LOAD, MASSADD or  REPLACE?

What options should be used to increase the performance of the LOAD and REPLACE functions?

Environment

Release:
Component: DB

Resolution

MASSADD adds one record at a time and runs under the MUF with logging enabled so it's not a good option when loading a large number (millions) of records. Even splitting the job to run multiple MASSADDs concurrently will cause contention as they are all updating the same area. 

A LOAD would be much faster than MASSADD as it adds records by updating data blocks in the batch address space without logging. Also with the use of the OPTIMIZE= and SORT= options, it sorts the index entries and performs the index update after all the data has been loaded which is more efficient .

DocOps section Loading Multiple Tables Into a Data Area lists options for loading multiple tables.

LOAD can only be used for the first table in the area as it overwrites any existing data in the area.

The LOAD function should be used first for the table that contains the most number of records.

For very large tables with millions of records consider either separating the tables each into their own area and database and also partitioning the large tables to split them into smaller tables. Multiple LOAD and other DBUTLTY functions can then be run concurrently against each partition. With partitioning, no changes are needed to the programs.

For the other tables in the area use REPLACE rather than MASSADD. REPLACE replaces all the data in a table without affecting other tables like LOAD does. REPLACE adds records by updating data blocks in the MUF address space but does not perform logging. Index records are sorted in the DBUTLTY batch address space.

Performance recommendations for running the LOAD function

  • Pre-sort the EXTRACT file in native key sequence before doing the LOAD.

  • Specify in the JCL on the input file DD statement the parameter BUFNO=150 (can go up to 255) to increase the system buffers.

  • Specify in the EXEC statement in the JCL a large REGION size or REGION=0M to give it the maximum amount of memory to the job.

  • It is always best to run the job on the same LPAR as MUF. 

  • Use a JOBCLASS and WLM setting for the job to allow it to get the best performance from the processor. 

  • An INIT of the data area and the IXX should be performed before doing the LOAD.

  • Add SET  OPTION1= statements to the DBUTLTY run to increase the number of buffers, options DATALN=nnnn, DATANO=5000, DXXNO=2000IXXNO=1000  and IXXLN=nnnn

  • Use the options KBYTES=, OPTIMIZE=YES, SORT=, SORTWK=, SORTDFLT=YES and OPTION1=(I) in the LOAD. 

    • The formula to calculate KBYTES= is:   blocksize * #_of_blocks_per_track * #_of_tracks * 2 / 1024.  

    • The formula to calculate SORT= is: #_of_keys_in_table * #_of_records * 1.3 
      Note: SORTDFLT=YES will calculate and override the SORT= value only when using a BACKUP input file or if the CXX currently has record counts for the area.

  • Specify attributes DUPE-MASTER-KEY=NO and CHNG-MASTER-KEY=NO in the table definition to reduce horizontal DXX searches and reduce index processing time.

  •  Eliminate any unnecessary alternate key in the table definition.

  • Block the data area in 4K increments to save excess paging and I/O.

  •  DATANO and DATANO2 buffer specifications should be done on 4K increments.


Sample DBUTLTY LOAD job with recommended options:

//LOAD     EXEC PGM=DBUTLTY,REGION=0M
//STEPLIB  DD DISP=SHR,DSN=DATACOM.CUSLIB
//         DD DISP=SHR,DSN=DATACOM.CABDLOAD (or CAAXLOAD for AD)
//CXX      DD DISP=SHR,DSN=DATACOM.CXX 
//INFILE   DD DSN=input.datset.name,BUFNO=150,DISP=SHR
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(500,500))
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(500,500))
..
//SORTWKnn DD UNIT=SYSDA,SPACE=(CYL,(500,500))
//SYSPRINT DD SYSOUT=* 
//SORTMSG  DD SYSOUT=*
//SYSIN    DD *
SET OPTION1=DATALN=4096;DATANO=5000 
SET OPTION1=DXXNO=2000;IXXNO=1000;IXXLN=4096 
COMM OPTION=CLOSE,DBID=nnn
ACCESS STATUS=OFF,DBID=nnn,USERS=FAIL
INIT AREA=IXX,DBID=nnn
INIT AREA=xxx,DBID=nnn
LOAD AREA=xxx,DBID=nnn,FORMAT=EXTRACT,CLUSTER=DENSE,
     DDNAME=INFILE,KBYTES=nnnn,OPTIMIZE=YES,
     SORT=99999999999,SORTWK=nn,OPTION1=(I)
ACCESS STATUS=WRITE,DBID=nnn
/*


Performance recommendations for running the REPLACE function

  • Pre-sort the extract file in native key sequence before doing the REPLACE.

  • Specify in the JCL on the input file DD statement the parameter BUFNO=150 (can go up to 255) to increase the system buffers.

  • Specify in the EXEC statement in the JCL a large REGION size or REGION=0M to give it the maximum amount of memory to the job.

  • It is always best to run the job on the same LPAR as MUF. 

  • Use a JOBCLASS and WLM setting for the job to allow it to get the best performance from the processor.

  • Use the options SORT=, SORTWK= and OPTION1=(I) in the LOAD. 

    • The formula to calculate SORT= is: #_of_keys_in_table * #_of_records * 1.3 

  • Specify attributes DUPE-MASTER-KEY=NO and CHNG-MASTER-KEY=NO in the table definition to reduce horizontal DXX searches and reduce index processing time.

  •  Eliminate any unnecessary alternate key in the table definition.

  • Block the data area in 4K increments to save excess paging and I/O.

  •  DATANO and DATANO2 buffer specifications should be done on 4K increments.

Sample DBUTLTY REPLACE job with recommended options:

//REPLACE  EXEC PGM=DBUTLTY,REGION=0M
//STEPLIB  DD DISP=SHR,DSN=DATACOM.CUSLIB
//         DD DISP=SHR,DSN=DATACOM.CABDLOAD (or CAAXLOAD for AD)
//CXX      DD DISP=SHR,DSN=DATACOM.CXX 
//INFILE   DD DSN=input.datset.name,BUFNO=150,DISP=SHR
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(500,500))
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(500,500))
..
//SORTWKnn DD UNIT=SYSDA,SPACE=(CYL,(500,500))
//SYSPRINT DD SYSOUT=* 
//SORTMSG  DD SYSOUT=*
//SYSIN    DD *
REPLACE AREA=xxx,DBID=nnn,DDNAME=INFILE,MULTUSE=YES,
        SORT=99999999999,SORTWK=nn,OPTION1=(I)
/*
 

Additional Information

See DocOps sections Loading Multiple Tables Into a Data AreaLOADMASSADD and  Table Partitioning