The HISTFILE retains job statistics on a long-term basis. It is used for creating a history report. Since it will create new records for every new run of job, over time it grows very fast, so you must archive data from the history data sets to tape or disk. Running the HISTFILE archive job periodically (weekly or monthly) is recommended.
Reason for critical impact
This job needs to turn off the tracking first, and turn on the tracking after completes. When the TRACKING is set to NOSTORE, tracking requests are accumulated in the checkpoint data set, which is very small and will be out of space quickly. And since tracking data cannot be processed, job status is not updated.
If the archive job runs too long or fails, you will notice:
- Job status on CSF or GUI is not updated, new jobs will show "Queued for Submission"
- Error messages start with "INSUFFICIENT CHECKPOINT" will show in JESMSGLG, like ESP1156E/ESP1157E/ESP335E. These errors may mean loss of schedule data, job tracking data and monitor notification data.
- HISTFILE archive job didn't run last step and following commands were not issued:
OPER HISTFILE HIST1 OPEN
OPER TRACKING STORE
Important! Since TRACKING is turned off, the failure of the archive job can NOT be reported automatically.
How to avoid
- HISTFILE should have none or few extents, enough free space and big enough secondary allocation size, to guarantee it's in good shape even in exceptional conditions. Recommend secondary allocation is 20% of primary allocation size; and please use your judgment when primary allocation is very large.
- When HISTFILE size is increased, remember to increase the archive data set and temporary data set size accordingly, otherwise the JCL will abend with B37. For example, ESP.HISTARCH and ESP. HISTTEMP Reference here: HISTTEMP
- Choose a time of low system activity to run this archive JOB.
What to do if it occurs
- Restore the TRACKING by "OPER TRACKING STORE" from page mode to avoid any delays in active jobs. Note: if there are many queued up schedules and all CLASS(0) EILCASS are in use, you should issue "OPER QUIESCE" to stop process schedules and "OPER EICLASS SET CLASS(0) MPL(16)" to increase the MPLs which can be used to process tracking data.
- Check if HISTFILE has more space, ESP277E will show in JESMSGLG if it doesn't have any free space:
- If yes, issue "OPER HISTFILE HIST1 OPEN" to catch up workloads and run archive job later after the error is corrected;
- If not, then you need to decide:
1. Run workloads without opened HISTFILE, the real time schedule will run fine, but the activities are not stored in HISTFILE and therefore are not available for history report;
2. Correct the archive job and rerun it. If it takes longer time, some critical jobs may miss their SLO.
3. Allocate a new HISTFILE and use it temporarily:
Using IDCAMS, to define new HISTFILE
Rename existing HISTFILE to .OLD, and new HISTFILE to existing name
Issue "OPER HISTFILE HIST1 OPEN"
It will take normally 5-30 minutes to resume the processing. If "LISTCKPT" issued from page mode shows no difference on "HIGHEST ADDRESS USED" and "BYTES IMBEDDED FREE SPACE", then COLD start will be needed to reformat the checkpoint file. Start ESP with PARM=COLD.