What is the meaning of event error 0966 status?
A 0966 error status means that the IDMS "in use" flags on each of the database files are set 'ON'.
Dispatch runs with an IDMS database and when a batch job or the Dispatch system itself starts, it turns the "in use" flags of each of its database files on. If you try to run another batch job or start Dispatch when the flag is already set as "in use", you can receive a 0966 error status.
There is a utility called DSEXPFIX that can be used to turn these "in use" flags 'OFF'.
* However, in the event you receive an 0966 error and the Dispatch started task is down, you need to consider WHY you received the 0966 as running DSEXPFIX does not perform any type of database file recovery before resetting the flags. This can corrupt the database files!
Did a batch job abend? Did Dispatch not come down clean? Was there some other problem that was not resolved correctly?
If the Dispatch system itself is cancelled or abends, when you bring it back up IDMS checks to see if the "in use" flags are on. It then uses the journals to rollback any database updates that were being processed. This recovery process is known as a warmstart. Under most circumstances, the warmstart would be successful and you would normally not see a 0966 error.
When the maintenance jobs are run with Dispatch up, updates to the database are done by IDMS and written to the journals. If there is a problem, then the updates (run units) are rolled out by IDMS and broken chains are prevented. So again, you would not see the 0966 error status.
Running with Dispatch up does use more system overhead and running the batch jobs with the system up takes longer to execute because of this journaling.
If you run batch maintenance with Dispatch down, the first step of the job is a temporary backup of the affected areas. If the job abends, our batch maintenance jobs are designed to automatically restore the affected areas upon abending or some other non-zero return code event. Or, if the job is canceled for some reason, allow you to restore the affected areas by manually setting the RESTFI symbolic as RESTFI=LT and resubmitting that same canceled job.
You should NEVER just run the DSEXPFIX job as your only recovery process under the circumstance of having a problem with a batch maintenance job as there can be partial updates on the database and running DSEXPFIX can cause broken chains. A restore is usually the only way to maintain the integrity of the database files.
There are however some circumstances where a DSEXPFIX is ok to run. For instance, specific JCL errors or certain space problems can happen and even though the databases are open and the "in use" flags set to ON, no processing has occurred.
If you have any doubt about whether or not running DSEXPFIX will cause broken chains or database corruption, call support to see if execution of DSEXPFIX is safe.
*** If you do decide to run DSEXPFIX, you should ALWAYS follow it with a run of the DSEXDBAN job immediately after DSEXPFIX completes ***
The DSEXDBAN job is used to analyze the integrity of the database files and identify if there are broken chains. To determine if you have broken chains, check the output of DSEXDBAN and do a FIND for any messages that contain the characters "not found". Any "not found" messages could indicate that broken chains do exist within the database files and you should contact Support for verification.
If you truly do have broken chains, then a RESTORE of the corrupted database file to a time when the broken chains did not exist is likely the only acceptable resolution.