It is time to reorganize your data. Do you simply back up and reload the data or do you initialize (INIT) the Index Area (IXX) before you reload the data? The choice you make can affect how applications on your system perform. Understanding a little about how the IXX is built will help you decide if you need to INIT it before a LOAD.
Performance Considerations
Contrary to what many clients think, it is not required to INIT the IXX prior to a LOAD. In many cases, the index INIT followed by the LOAD will run significantly faster than the LOAD without the INIT. This is because, for full database reorganizations (native BACKUP/LOAD), where there are large numbers of index entries, the LOAD process time can be improved by initializing the index before starting the LOAD. However, the decision to INIT or not INIT is not based on performance alone. There are some other considerations that need to be reviewed.
Historical Information for CBS
LOAD and RETIX always rebuild the index entries for the table(s) in the selected area(s). However, certain information used by the index (maintained by CBS) is not rebuilt/reset during the LOAD or RETIX unless the index has been initialized prior to the LOAD. CBS stores information in the IXX concerning the average number of entries per index block for up to 4 or 5 levels of the index. CBS also stores information concerning the average number of rows we can read from a single block without having to go to another block in sequence by a particular key. In addition, for each of these numbers, CBS keeps track of the standard deviation so that we know how accurate this ‘average’ is.
This information is rebuilt when a LOAD or RETIX is done if the IXX is initialized prior to the LOAD or RETIX. This information is also updated whenever CBS does population counting. Resetting the CBS historical index statistics information may affect application processing. For example, if you notice that CBS selects a different key to use after a reorganization it may have been because the CBS historical information had been reset and CBS is now beginning its decision making process "afresh". As time goes on, the CBS historical statistics will be rebuilt.
Compressing Index Space
When CA-Datacom/DB builds an IXX block, space is left in the block for additional entries. Our code uses different algorithms to determine how much space needs to be left when it builds the index block. Which algorithm we select to use is determined by several factors, the most important being where records are added in the file when an ADDIT is done The decision you make to INIT the IXX or not, will affect how much space we allow in each block for additional entries.
For example, if you INIT the INDEX, then do a LOAD with SORT, assuming there are no shared keys across different areas, then at LOAD time CA-Datacom/DB will build the index entries utilizing about 90% of each index block. This will provide a tightly compressed index with a small amount of free space in the "in-use" blocks. As time goes on, the index processing (adding and deleting keys) balances the percentage of free space in the index blocks.
If you don’t do an INIT of the IXX, then we will not change how the blocks are utilized ("packed") when the LOAD or RETIX is done. The balanced free-space percentage for the "in-use" blocks remains unchanged.
Balancing the Index
During index maintenance, CA-Datacom/DB uses an algorithm to predict where future maintenance will go. We use this information to decide how to perform IXX block splits. For example, if you have a sequential key that is a counter that is always incremented, adds would always be high. In this case, we would split the blocks 90/10. If you had a sequential key and the adds were always low, we might decide to split the blocks 10/90. If, on the other hand, ADDITs are totally random, then we can’t predict as easily where new entries will go and we would probably split the blocks 50/50. ADDITs provide the most important information used in the algorithms that determine how blocks are split. When you INIT the index, any information gathered during ADDIT processing is reset, and the index split processing reverts to the 90/10 default algorithm. In some cases, this may not be the most efficient choice.
Making the Decision
It is difficult to make a recommendation about whether an INIT of the IXX should be done in all cases. The decision as to whether the IXX should be initialized depends on the application and how you access and maintain that particular data. Let’s say that you have a lot of CBS queries that do population counting. Would population counting be more efficient or more likely to select an efficient key if you do an INIT of the IXX before a RETIX or LOAD, or would you be better off if you didn’t INIT the IXX? You have to look at how you do adds to the table involved to make that decision.
When we do population counting, CA-Datacom/DB doesn’t look at the entire index (this would be too time consuming). Instead, we look at some number of blocks at the beginning of the IXX and assume that these blocks reflect a representation of the entire index. If most of the ADDITS go to blocks that are at the end of the index, then the first "n" number of blocks we look at to do population counting would be representative of the index. In a case like this, you might benefit from doing an INIT of IXX before doing a LOAD or RETIX.
If the adds to that table are mostly random, the index processing would determine to split the blocks 50/50. If an INIT of the index is done, this statistical information is reset. As a result, when you do a LOAD or RETIX, the index would be rebuilt and the blocks split 90/10. A population count done immediately after the index was initialized and loaded in this case might not select the most efficient path
Understanding your application and how adds and deletes are done, as well as whether your key values are reused or not, to the table during regular processing will help you determine what needs to be done when it comes time to do a LOAD or RETIX of that area. In some cases, you will benefit from doing an INIT of the IXX, in others you may not. Knowing how maintenance to the table is done by the applications using it will help you decide what you need to do when, or if, you need to load or rebuild an index.