When running 16 VTAPE jobs simultaneously, on occasion they appear to get stuck shortly after submission. The only way this situation was resolved was by cancelling one of the jobs. No error messages appeared in the SYSLOG during this processing. What is causing this hang situation?
The Vtape Display Active shows info, such as:
/SVTS D A
137 00000090 Scratch Virtual Volumes:
137 00000090 Pool1=17950 Pool2=n/a Pool3=n/a
137 00000090 Pool5=n/a Pool6=n/a Pool7=n/a
137 00000090 P Rmt Volser Cua ----Status---- Ds Type Jobname #I/O Last
137 00000090 1 n/a Privat 0100 I / O 0 349S JOBXXX1 4
137 00000090 1 n/a Privat 0101 I / O 0 349S JOBXXX2 4
137 00000090 1 n/a Privat 0102 I / O 0 349S JOBXXX3 4
137 00000090 1 n/a Privat 0103 I / O 0 349S JOBXXX4 4
137 00000090 1 n/a Privat 0104 I / O 0 349S JOBXXX5 4
137 00000090 1 n/a Privat 0105 I / O 0 349S JOBXXX6 4
137 00000090 1 n/a Privat 0106 I / O 0 349S JOBXXX7 4
137 00000090 1 n/a Privat 0107 I / O 0 349S JOBXXX8 4
137 00000090 1 n/a Privat 0108 I / O 0 349S JOBXXX9 4
137 00000090 1 n/a Privat 0109 I / O 0 349S JOBXXX10 4
137 00000090 1 n/a Privat 010A I / O 0 349S JOBXXX11 4
137 00000090 1 n/a Privat 010B I / O 0 349S JOBXXX12 4
137 00000090 1 n/a Privat 010C I / O 0 349S JOBXXX13 4
137 00000090 1 n/a Privat 010D I / O 0 349S JOBXXX14 4
137 00000090 1 n/a Privat 010E I / O 0 349S JOBXXX15 4
137 00000090 1 n/a Privat 010F I / O 0 349S JOBXXX16 4
And the Vtape IPCS Logger info shows data, such as:
12/24/2022 05:37:38.097008 SIC8 1 FSTATUS SVT1V1 09 SVTSVTU 008D4D90 0
...... 0108 MOUNT Mounting volume
008D4D90 SVT1V0460I 0108,*SCRT*,GRPTS=0,G#=4,P#=1n
12/24/2022 05:37:38.108588 SIC8 1 FSTATUS SVT1V1 02 SVTSVTU 008D6128 0
...... 0101 MOUNT Mounting volume
008D6128 SVT1V0460I 0101,*SCRT*,GRPTS=0,G#=4,P#=1n
12/24/2022 05:37:38.151870 SIC8 1 FSTATUS SVT1V1 10 SVTSVTU 008D4AE8 0
...... 0109 MOUNT Mounting volume
008D4AE8 SVT1V0460I 0109,*SCRT*,GRPTS=0,G#=4,P#=1n
12/24/2022 05:37:38.160737 SIC8 1 FSTATUS SVT1V1 13 SVTSVTU 008D42F0 0
...... 010C MOUNT Mounting volume
And the SYSLOG Display Active shows info, such as:
/D A,L
214 00000090 HXXXXAA SXXXAA SYYYA002 NSW J HXXXXMA STZH
214 00000090 HXXXXKA SXXXKA SYYYK002 NSW J HXXXXBA STZH
214 00000090 HXXXXDA SXXXDA SYYYD002 NSW J HXXXXGA STZH
214 00000090 HXXXXQA SXXXQA SYYYQ002 NSW J HXXXXTA STZH
214 00000090 HXXXXPA SXXXPA SYYYP002 NSW J HXXXXFA STZH
214 00000090 HXXXXJA SXXXJA SYYYJ002 NSW J HXXXXEA STZH
214 00000090 HXXXXSA SXXXSA SYYYS002 NSW J HXXXXRA STZH
214 00000090 HXXXXIA SXXXIA SYYYI002 NSW J HXXXXCA STZH
Release : 12.6
There were not enough tape drives available to handle all of the simultaneous processing needed. Additional drives were needed to allow at least one of the jobs to complete successfully, thus freeing up additional drives for the other Vtape jobs.
This problem was caused by each of the 16 jobs allocating a tape drive, but were all needing a second drive. It wasn't until one of the jobs was cancelled that one of the other waiting jobs could successfully allocate the needed second drive (and thus this job could run to completion, and then free up its drives for the other jobs to use). So, the bottom line is that this is a 'resource problem', and more drives need to be available so that the 16 jobs can run simultaneously. This situation also explains why this is a very intermittent problem, and not likely to occur often (unless the system remains very constrained in tape resources). Besides having more tape drives available when the jobs are run simultaneously, it should also be possible to stagger running these jobs so they don't all run at the same time, thus greatly reducing device contention.