VMHSYS104E/REACHED JOB LIMIT

Products

VM:Batch Mainframe VM Product Manager VM:Manager Suite for Linux on Mainframe VM:Manager Suite for z/VM VM SUITE

Issue/Introduction

VMHSYS104E/REACHED JOB LIMIT
Product is going into a disabled state and not accepting jobs.

VMUSER1 82 VMHJOB271S JOB disk is full; VM:Batch disabled.
VMUSER1 82 VMHSYS104E VM:Batch disabled; job cannot be submitted.

The message reads VM:Batch unable to allocate a new job file block because the disk on which the job file blocks reside (JOB disk) is full which seems a erroneous if the disk was only 46% full.

The JOB disk (1B0) was only 46 % full (50 cyls) so increasing it to 75 cyls did not help. However, there were 10,000 files on the disk. Changing the RETAIN from 168 to 120 seemed to remove over 9000 files from the disk and allowed VMBATCH to accept jobs so that the application could continue processing.

After the initial failure, we found the 1B0 at 50 cyls, link and access showed 46% used with 10000 files/ job files RETAIN 192.

1^st attempt to recover:
Increased 1B0 to 75 cyls, changed RETAIN from 192 to 168
VMBATCH still failed, disabled after restart 1B0 at 75 cyls, link/access showed 30% used with 9936 files of job files

2^nd attempt to recover:
1B0 still at 75 cyls, changed RETAIN from 168 to 120
VMBATCH restart successful with processing monitoring of the job file count went as low as 1000 files

After the RESTART (with a 75-cyl 1B0 and RETAIN 168), VM:Batch console showed messages
VMHJIN305W Invalid sequence of jobs detected; jobs being reordered
VMHJIN320I Reordering of jobs completed.
VMHJIN273I There are 9982 jobs in the system.

Is there a 10,000 job limit?

We believe that the RETAIN value can be changed dynamically using the ADMIN -> CONFIGURE function. If so, if the value is reduced, does VMBATCH immediately remove the jobs that are older than the value specified? If not, when does the removal process occur and is there a way to force the removal process to run?

What alerts/messages are produced when JOB disk space utilization approaches 100% or when the number of JOB files approaches 10,000?

As part of our efforts to reduce the jobs, we manually removed jobs via the VMBATCH OPERATOR LIST but only 20 at a time. What do you recommend to remove more than this limit? In addition, many of the jobs have the same name and it was not easy to identify the date of the job run so that we could try to remove these en masse.
What do you recommend to try to identify jobs with the same name but different dates for some "mass processing"?

Resolution

Is there a job limit?
There is a 9999 job number limit in VM:Batch.

What is the best way to monitor service virtual machine disk fullness in VM:Batch?
To monitor actual disk fullness you can use the MONITOR user exit to periodically call the CHECKDISK macro we supply with VM:Batch. The CHECKDISK macro checks the different disks and when the threshold is reached (CHECKDISK is set to 80% by default), it sends Operator messages when this occurs.

You can modify the CHECKDISK macro (we do indicate you can make local modifications to this) to suit your needs as far as how full a disk should be before it sends a message to Operator, etc.. You should also check the number of files on the job disk to see if you are reaching the 9999 job number limit. There is no code in CHECKDISK now to check the number of jobs, so you would have to add that. In VM:Batch there is no warning when the 9999 job limit is getting close to being reached. You can add that to your CHECKDISK macro with local modifications to the macro as we suggest in the documentation on the CHECKDISK macro.
For more information, see the VM:Batch Administrator Guide for MONITOR user exit and CHECKDISK Macro.

When does a changed RETAIN take affect?
VM:Batch has dynamic reconfiguration so RETAIN changes should take place right away. RETAIN values are calculated based on the job completion time and the current retain value.
So, lowering RETAIN does have the effect of clearing completed jobs out further. There is a CONFIGUR selection in the VMBATCH ADMIN screens. The only records you can't change during a VMBATCH reconfiguration are the ACCESS and DIRECT records. However, the DIRECT record was eliminated with VM:Batch 1.4(due to use of directory reader diagnose) so now ACCESS is the only record you can't change.

By default, the RETAIN removal process runs every hour. If you have specified a retain interval less than an hour, then that is how often the removal process will run. However, once you change it, you do have to wait for the next RETAIN process to run in order for the new retain time to be used when considering jobs for removal. So, you may have to do manual removal in the meantime. There is no way to force the automatic removal process to run.

Is there a better way to remove jobs en masse?
Keep in mind that a job cannot be removed until it has completed so if you're thinking you can remove jobs that were not finished, you cannot.
The OPERATOR REMOVE command allows pattern matching to specify jobs to remove using various criteria such as the user ID that submitted the job. You can also specify ALL to remove all completed jobs. Again, no job is removed unless it has completed.

You could also use the VMBATCH OPERATOR LIST command to list all jobs that are ALLDONE, put that result in an output file that you could use to automate removal of jobs that meet whatever criteria you are looking for to automate removal by some sort of selection.

The LIST output can be put in a file and will have a line for each job as follows:
JOB1788 1788 USER001 A waiting position 0011 NOHOLD
JOB1790 1790 USER001 A completed; Return Code: 0 END

If you just listed ALLDONE, all you would see is the 1790 job that had completed.