Database tuning in a run-time environment usually focuses on the definition of the I/O subsystem environment. Unfortunately little attention is given to the database structure once the initial design is complete.
This document will examine the statistics that are available and how they correlate run-unit performance to the database design.
Establishing the test environment
The purpose of run-time analysis is to evaluate the level of database activity generated by application logic given the physical structure of the database. For the analysis to be meaningful there are four major items that must be considered.
The evaluation should be focused on the most critical transactions.
During the online day the most critical transactions will typically be the highest volume transactions. It is certainly more important to achieve sub-second response times for transactions that will run millions of times during the online window than for a transaction that will typically be executed 100 times. In the batch environment long running jobs that are critical to the completion of the batch window should be the focus of the analysis as opposed to small or infrequently run batch jobs. Hopefully the physical database design was created with these same critical transactions in mind.
Results must be compared against anticipated standards.
These standards may be developed by accumulating the desired statistics before a planned change such as a database design or application modification, type of DASD, CA IDMS Release level, or operating system level. For new applications it may be a matter of manually walking through an application's logic and estimating the number of records accessed, reads/writes, and other statistics that may be relevant based on the portions of the database that will be processed. This will require reasonably accurate estimates of the number of record occurrences within the database, the expected lengths of sets, and the anticipated page sizes to determine cluster overflow. For example if a program will walk a single occurrence of a VIA set owned by a CALC record and you expect each occurrence of this type of set to contain 10 members you might expect a transaction to access 11 records and generate one read if you anticipate all 10 member records to be able to fit on a single page. However if the same set was a non-VIA chained set you would estimate that the program would access 11 records and generate 11 pages read. Having these standards will allow you to determine whether the statistics generated during your testing are reflecting positive or negative results.
Test environments must simulate true production environments.
Although it may not be practical for a test database to be as large as a really huge production version the general characteristics of the databases should be the same. If your test environment is only 10% of production, reduce your test data to 10% of the expected production volume. However if you expect the average length of a set to be 100 members in production then occurrences of the same set in the test environment should also contain an average of 100 members. Page sizes should also be the same between the two environments so that clustering in test will accurately mimic production and the percentage of space used in the test database should be similar to that in production. By keeping these two environments similar you will generate statistics that will be accurate on a transaction basis and can be reasonably extrapolated to estimate overall run-times for things like long running batch jobs.
Do not allow buffering to mask potential problems.
Physical I/O is probably the most costly resource that one can consume. If you create buffer pools in your test environment that are too large then you have the potential to lock more pages into buffers than would be expected in a production environment. Doing something like walking long non-VIA sets, which would be expected to generate large numbers of reads, may be masked by the database becoming resident within the buffer pool.
The following list contains the standard database or run-unit statistics that CA IDMS accumulates for every active run-unit. Those statistics marked with an '*' are the fields that will be considered during a typical run-time analysis.
<Please see attached file for image>
DB statistics are accumulated and reported at the system, task, and run-unit levels and it is important to understand what is being counted at each level. The highest level of statistic is at the system level which is accumulated for an entire Central Version (CV). System level statistics can be displayed online using the 'DCMT DISPLAY STATISTICS SYSTEM' command or generated by running batch report SREPORT 03. These statistics are a total of all of the database activity that has occurred during the life of the CV and are relatively worthless when attempting to analyze the performance characteristics of a single task or program.
Task level reports provide access to the second level of DB statistics. Installations with CA IDMS Performance Monitor installed have the ability to look at these numbers online using the Application Monitor (PMAM) or in batch by running reports PMARPT17 and 18. Sites without Performance Monitor would typically use batch reports SREPORT 5, 6,7,8, or 9. These reports take their input from the CA IDMS DCLOG and the CV on which the testing occurs must have 'STATISTICS TASK WRITE' specified within its sysgen. Task level DB statistics contain the accumulation of all of the run-units that have been executed during the life of the reported task. This will include all run-units for which the application program(s) issued a BIND RUN-UNIT as well as any run-unit activity performed for queue, program load, and message activity. Task level statistics will provide a good representation of the overall database work required to complete the entire task, but the inclusion of non-application generated database activity still makes it difficult to analyze the performance of the application's database design.
For the purposes of analyzing the performance of an application against its database structure the lowest level of statistic provides the most meaningful values. Run-unit statistics will reflect the database activity for a single run-unit since the run-unit's BIND RUN-UNIT command until the time at which the statistics were requested. Run-unit statistics can be requested by the application program by issuing the ACCEPT FROM IDMS-STATISTICS command but it is then the program's responsibility to output them in a usable manner. This can be useful when trying to isolate the database activity generated within a particular section of a program's code. Run-unit statistics are also written to journal checkpoint records and can be accessed from archive journal files using journal reports JREPORT 2, 3, and 4. Many sites suppress the journaling of retrieval run-units to reduce the amount of data generated so it may be necessary to temporarily change the CV's sysgen to specify JOURNAL RETRIEVAL on its SYSTEM statement so that critical retrieval programs can be reviewed.
Since physical I/O is the most expensive resource used by a task the two most important statistical fields provided within the DB statistics block are PAGES-READ and PAGES-WRITTEN. Even if a site does not have specific guidelines as to how many physical I/Os a task is permitted, especially in an online environment, it is necessary for a database designer to have an idea of how many I/Os are to be expected per transaction and how many are really being generated by an application. PAGES-READ identifies the number of times a physical I/O operation was required to bring a page into the buffer pool. This is a number that can easily be skewed when testing is performed with a buffer pool that is too large for the test database size and task volume relative to those components in the production environment. PAGES-WRITTEN reflects the number of times that an update of a page results in the page's 'must-write' switch being turned on. The actual physical I/O may be experienced by this task but could actually be performed under the control of another task if the second task has a need to reuse the space in the buffer before the updating task has had an opportunity to issue the physical I/O. However it is a value that will provide an accurate estimate of the number of physical write I/Os that a particular program may require the CA IDMS system to issue.
The two major causes of excessive reads or writes are navigating non-VIA or CALC-to-CALC sets. When these structures are involved one must frequently make the decision to reduce the I/O requirements of the affected critical transactions and possibly increase the needed I/O for those transactions deemed less critical.
<Please see attached file for image>
Figure 2 represents a database where the ITEM record is a member that participates in both the ORDER-ITEM and PRODUCT-ITEM sets. Specification of which set will be used as the VIA set will have a significant impact on the number of physical I/Os needed to process related transactions. If the transactions that navigate the ORDER-ITEM set are executed frequently and those that walk the PRODUCT-ITEM set are rare it is desirable to select the ORDER-ITEM set as the VIA set . This would allow the tasks walking the ORDER-ITEM set to get all of their required data in an average of 1 I/O if the projected number of ITEMS within the set of 5 is accurate. Tasks needing to walk the entire PRODUCT-ITEM set could invoke up to 1000 I/Os. If the amount of I/O needed to walk the PRODUCT-ITEM set is unacceptable then further design work on the physical implementation of the ITEM records would be in order.
Sometimes the logical design of the database and the need to directly reference two record types directly by a key value leads to two CALC records being related through a set relationship. Since CALC records are randomized across a database area the navigation of this set will typically result in one physical I/O for every member record accessed. Figure 3 provides an example of this type of relationship.
<Please see attached file for image>
Sometimes there may be multiple critical transactions that use the records involved in this type of structure and it may be necessary to inflict a small overhead on one type of transaction to create a larger benefit for another transaction. If transaction 'A' needs to access a CUSTOMER and all of its associated ORDER record occurrences that task will incur one physical I/O for each ORDER in the set. However transaction 'B' needs to directly access an ORDER based on that ORDER's calckey. Both are considered critical transactions but testing of transaction 'A' shows that it is generating too many physical reads. A solution may be to redefine the ORDER record as a VIA record and cluster it around its owning CUSTOMER record occurrence. The direct access through a key value for transaction 'B' can be supported by adding an index to the ORDER records. Although this will add some I/O to transaction B it should be a manageable amount that may still allow those transactions to be processed in the allowed amount of time.
PAGES-REQUESTED is a statistic that identifies the number of times that the I/O sub-system is called to access a database page. If the page must be physically read the I/O sub-system will increment the value of PAGES-READ. As a result PAGES-REQUESTED and PAGES-READ have a direct correlation to each other. This relationship will be discussed when various performance related ratios are examined. However it is important to note that the execution of the exact same test in a CV environment will result in a larger value for PAGES-REQUESTED than will be generated in a local mode test. This is due to the CV releasing internal buffer locks at the completion of each DML command to provide for better throughput of run-units. This is not a consideration in local mode since there is only one active run-unit in the CA IDMS environment.
The OVERFLOW and NO-OVERFLOW values report on the number of times that records could not be stored on the page to which they were initially targeted. These statistics are kept separately for CALC and VIA record types. To understand what these numbers mean one has to understand how CA IDMS determines a record's target page. For a CALC record the value of the record occurrence's calckey is hashed to calculate the target page. For VIA records the target page is typically that of the record occurrence that is current of the record type's VIA set. For example, if a program accesses the owner of a set and then stores the member record the member's target page will be the page on which the owner resides. When a second member is then stored into the database, the page on which the previous record was stored will be the target page for the second occurrence. If each of these records is successfully stored on their target page the VIA NO-OVEFLOW field would be incremented for each store operation. However if the first record occurrence did not fit on its target page the VIA OVERFLOW field is incremented and the page on which the record was finally stored will be the target page of the second record. Assuming there is space on this page for the second record the VIA NO-OVERFLOW field is incremented.
The overflow statistics represent the number of overflowed record occurrences generated by the run-unit being measured and do not represent the total overflowed occurrences within the entire database area. The IDMSDBAN utility can be used to give you a better idea of the overall condition of the database. However consistent numbers of overflowed record occurrences over what your pre-test estimates might suggest as expected should be examined. High values for CALC overflows might suggest hotspots within the database or that the database is starting to run short on available space. Hot spots may be eliminated by changing the page range of the CALC record's area slightly which can alter the distribution of the CALC records within the area.
Higher than expected numbers for VIA overflows usually indicates some type of problem with decisions that were made regarding record clustering. Clustering is the practice of storing occurrences of one record type in close physical proximity to occurrences of another record type to reduce I/O requirements. Perhaps the greatest impact on clustering efficiency is the selection of an area's page size. If your initial analysis of the database calls for a cluster size of 10,000 bytes the selection of a 4K page size should be expected to result in overflowed record occurrences across multiple pages. You might consider selecting a page size closer to 12K to reduce the number of overflows encountered.
The most common problem encountered with VIA clustering is that the clusters for owner records are just too large regardless of the page size selected. This may be due to the fact that there are just too many member occurrences of a single member record type per owner to fit on a page or that there are too many VIA record types related to the owner. In these cases you might consider moving some of the member record types to a different area than the owner. This provides the ability to define a member area with a much larger page size than you might want to define for the owner area. Many sites will move the less frequently used record types into the new areas and retain the frequently used record types in the same area as the owner record.
When calculating your expected cluster size for an owner record type, do not forget to include the amount of space needed to maintain any user-owned indexes. The SR8 records making up the index structure for a user-owned index are clustered with the occurrences of the set's owner record. Large index occurrences can have a devastating impact on the efficiency of an area's data clusters. When large user-owned indexes exist for a record type, you may wish to convert the indexes into a single system-owned index which can be moved to its own area. This may require the inclusion of the owner's key within each member record occurrence to provide the ability to differentiate between the member records for each owner occurrence. Figure 4 gives an example of such a modification.
<Please see attached file for image>
RECORDS-CURRENT and RECORDS-REQUESTED provide an insight into how much work the DBMS must do to service a DML request once the required data has been brought into the CA IDMS bufferpools. RECORDS-CURRENT roughly equates to the number of functional DML commands issued by the run-unit. For example if the run-unit issued 21 OBTAIN commands between the BIND RUN-UNIT and FINISH commands and all of the OBTAINs successfully returned a record to the program one would expect RECORDS-CURRENT to be reported as 21. RECORDS-REQUESTED reports on the number of record occurrences the run-unit had to access to accomplish the functional request of these DML commands.
To properly use these statistics the person analyzing the tests should be familiar with the type of DML commands being issued by the application program and the structures being navigated. For example a program that does a single OBTAIN CALC would be expected to ideally have a RECORDS-CURRENT value of 1 and a RECORDS-REQUESTED value of 2. The 2 for RECORDS-REQUESTED could be attributed to one access of the SR1 record on the data record's target page and one access for the actual data record. Higher values for RECORDS-REQUESTED would be caused by multiple data records with a location mode of CALC targeting to the same database page. Very high values could indicate a hot spot on the database that might be alleviated by a change in the number of pages in the area which would change the CALC record distribution.
The processing of long set relationships will frequently be the cause of inflated numbers for both the RECORDS-CURRENT and RECORDS-REQUESTED statistics. When only a subset of the member records of a set relationship are needed it may be helpful to add an additional set relationship between the two record types to reduce the number of records that must accessed by higher priority transactions. Figure 5 shows the addition of an OPEN-ORDER set between the CUSTOMER and ORDER records to reduce the number of records that must be accessed by those transactions only interested in orders that have not yet been filled for a particular customer. This should reduce both the RECORDS-CURRENT and RECORDS-REQUESTED counts for those transactions.
<Please see attached file for image>
Long sorted chain sets can often result in large numbers of RECORDS-REQUESTED while the value of RECORDS-CURRENT is relatively small when random reads are made against the set using the set's sort key. With a sorted chain set CA IDMS must walk on average half the set's members to locate the desired record. Each record checked results in the RECORDS-REQUESTED field being incremented to reflect the amount of work performed by the DBMS. If the bulk of the DML requests will be expected to be satisfied by records at the end of the set consider reversing the sequence of the set so that the desired records will be found at the set's beginning. However if the records are expected to be found throughout the set, consider changing the set from a chain set to a user-owned index set. Though this means that you must always access an index structure to locate the desired record occurrence this can frequently result in fewer RECORDS-REQUESTED than having to walk each data record preceding the desired occurrence.
Finally a high RECORDS-REQUESTED value can result from the lack of proper PRIOR or OWNER pointers on a set relationship. Missing PRIOR pointers can result in extra record occurrences being accessed for DISCONNECT and ERASE verbs or MODIFY verbs where the sort key of a set is being changed. Execution of FIND/OBTAIN OWNER verbs can also cause extra record occurrences to be access if the set involved has not been defined with OWNER pointers. If any of these verbs are used by the program being analyzed the missing pointer types should be added to the involved sets.
FRAGMENTS-STORED reports the number of variable length record fragments that were stored by the monitored run-unit. A variable length record is any record that is defined with data compression or whose element structure contains an OCCURS DEPENDING ON clause. As variable length records are modified such that their length increases it can be expected that some fragments will be generated. However if the run-unit only stores new occurrences of the variable length records or if the value of the FRAGMENTS-STORED field seems to be excessive, there may be a problem with the database.
As the record's database area fills the number of fragments that are generated can be expected to increase. If the area is greater than 70% utilized the solution may be to enlarge the record's area. However the problem is frequently in the record's schema definition. To minimize fragmentation every variable length record's schema definition should contain the clauses MINIMUM ROOT IS RECORD LENGTH and MINIMUM FRAGMENT IS RECORD LENGTH. In addition each area containing variable length records should specify a PAGE RESERVE to reserve space on each database page for the expansion of any variable length records residing on the page. A PAGE RESERVE will not eliminate fragmentation but will delay its onset.
You might also review the variable length record to see if defining it as variable length is warranted. If the record is initially stored using only a small number of the repeating group occurrences but will typically grow to use close to the maximum number of repeating occurrences you may be better off defining the record as a fixed length record. In a similar manner a record that is defined with data compression may be better defined as uncompressed if it initially contains long strings of repeating characters but over the life of the record occurrence will always be replaced by other non-repeating values.
The meaning of the RECS-RELOCATED statistic is dependent on whether the database is a non-SQL database or an SQL defined database. When monitoring a non-SQL database a non-zero value indicates that relocated records or variable length fragments are being brought back to their associated record's original target page. Relocated records in a non-SQL database can only be created as a result of running the RESTRUCTURE SEGMENT utility to expand the length of a record. Although a positive indicator that relocated records or variable length fragments are being cleaned up it is also an indicator that they exist and are causing additional processing overhead for the run-unit. An UNLOAD/RELOAD or REORG execution may be in order to remove these types of records.
When non-zero values for RECS-RELOCATED are found while monitoring processing against an SQL database it means that record occurrences are being relocated away from their original target pages. When the number of columns in an SQL table's definition is increased the actual size of the rows within the database are not increased until the next time that row is updated. If there is insufficient space on the database page for the increased space requirement the database record is relocated. Seeing consistent non-zero values for RECS-RELOCATED may indicate space shortages and an UNLOAD/RELOAD or REORG execution may be advisable to reduce the overhead associated with relocated records.
Many of the database statistics that have been discussed are used together as ratios to aid in their interpretation and the analysis of the run-unit environment.
To this point the subject of buffer pools has been avoided but the first ratio to be considered provides a measure of the effectiveness of the buffer utilization. PAGES-REQUESTED/PAGE-READ should be used along with an understanding of the type of processing being performed by the run-unit. A run-unit that does primarily CALC retrieval is likely to see a 1:1 ratio of pages requested to the number of pages read due to the random nature of the process. However if the application is expected to read a CALC record and walk a VIA set that on average has 5 members, a value of 6:1 would be expected. In general large values are desired. A decrease in this ratio may indicate the introduction of some form of new buffer contention such as a new application being added to the environment of an increase in overflow conditions within one of the areas involved with the run-unit.
RECORDS-REQUESTED/PAGES-READ should be reviewed for those run-units that walk VIA sets and is a measure of the clustering efficiency of the database. The values of this ratio should be as large as possible but you must still view them with a knowledge of the expected VIA clusters. If the number of records per cluster is expected to be 20 and the member records are small you might expect them to fit on a single page. A value of 20:1 would be anticipated. However if the member records are very large the initial design may have expected the set to span two database pages. In that case the expected value of the ration would be 10:1. Decreases in the value over time would be an indication that overflow conditions may be increasing.
The amount of work performed by the DBMS on a per DML basis can be estimated using the ratio of RECORDS-REQUESTED/RECORDS-CURRENT. The values of this ratio should be as low as possible. Run-units that perform key -valued retrievals against sorted chain or index sets, store records into these types of sets, or modify the fields used as the sort keys should be expected to have larger ratio values than those processes that walk entire sets using FIND/OBTAIN NEXT commands. Increasing values for the ratio could be an indication that these sets are becoming longer than anticipated and alternate structures may eventually be warranted. Increases for run-units that primarily perform CALC could be an indication that hot spots are developing in the associated areas.
The ratio of NO-OVERFLOW/(OVERFLOW +NO-OVERFLOW) should be reviewed for those run-units that store record occurrences into the database. A value of 1 for this value is an indication no overflow has occurred while values less than 1 indicate some level of overflow. As stated earlier some overflow is to be expected. If ratio values become smaller over a period of time it may be an indication that space utilization problems are developing. The overflow statistics are maintained separately for CALC and VIA overflow conditions. Pay particular attention to the ratio for the CALC overflows as they can have a greater impact on database performance as a CALC overflow may impact direct retrieval of all CALC records within the area as opposed to a single cluster of data.
The database's physical structure has the greatest influence on the number of I/O operations and the amount of CPU required by the DBMS to process an application's requests. Unfortunately it is usually the aspect of performance that is the least evaluated. Database statistics provide a means with which to measure the interaction of application programs and the database structure. However to do this successfully requires a knowledge of the application's critical transactions so the numbers can be properly interpreted. There should also be a set of standards established by application so that over a period of time you are able to spot statistical trends that may be a signal of developing performance problems.