Problem
On Nov 26 during SSM SHUTSYS, the task TSCXXXX was marked DOWN_DOWN when the task ended.
However the task still ran the STOP process for the TSCXXXX task and caused a TIMEOUT situation.
I believe it is a TIMING issue that SSM did not recognize in time that the task was actually down.
On Dec 3 during SSM SHUTSYS, it worked fine.
Here is some condensed information to assist in explaining what happened on Nov 26.
26NOV 18:00:11 ssm shutsys sysz
26NOV 18:00:34 OPS7914O SSM AUDIT: STCTBL.TSCXXXX UPDATED by OPSOSF SHUTSYS2 CURRENT_STATE=UNKNOWN
26NOV 18:00:34 OPS7914O SSM AUDIT: STCTBL.TSCXXXX UPDATED by OPSOSF SHUTSYS2 DESIRED_STATE=DOWN
26NOV 18:00:34 OPS7914O SSM AUDIT: STCTBL.TSCXXXX UPDATED by OPSOSF SHUTSYS2 MODE=ACTIVE
26NOV 18:00:42 OPS7902H STATEMAN ACTION FOR STCTBL.TSCXXXX: UNKNOWN RULE=SSMSTATE TABLE(STCTBL) NAME(TSCXXXX) TYPE(TSCXXXX_SYSZ) JO
26NOV 18:00:42 OPS7914O SSM AUDIT: STCTBL.TSCXXXX UPDATED by OPSMAIN STATESET CURRENT_STATE=UP
TSCXXXX CURENT_STATE=UP and DESIRED_STATE=DOWN.
26NOV 18:01:02 STC TSCXXXX
26NOV 18:01:02 OPS7914O SSM AUDIT: STCTBL.TSCXXXX UPDATED by *MASTER* *DYNAMIC.ESSMEZ CURRENT_STATE=DOWN
TSCXXXX came out here. CURRENT_STATE and DESIRED_STATE match - DOWN_DOWN.
The task was brought down externally by non-SSM task.
26NOV 18:01:03 OPS7902H STATEMAN ACTION FOR STCTBL.TSCXXXX: UP_DOWN RULE=SSMCMDS STCTBL.TSCXXXX DOWN {STOP TSCXXXX} 120 {CANCEL WPA
This should not have executed since CURRENT_STATE was actually DOWN instead of UP. SSM Engine "thinks" it is still UP per the message above.
OPS/MVS
There will always be a timing issue for resources that are being shutdown outside of SSM control when in active mode. They need to set the mode to passive so the SSM action does not fire in these cases.
Further recommend updating our sample SSMSHUT2 code to not set the CURRENT_STATE to UNKNOWN when setting DESIRED_STATE to DOWN. It only adds overhead to the shutdown and masks issues where the CURRENT_STATE it not set correctly.
/*--------------------------------------------------------------------*/
/* Set DESIRED_STATE = 'DOWN' if this is a SHUTSYS request. */
/*--+----1----+----2----+----3----+----4----+----5----+----6----+----7*/
if shutrequest = 'SHUTSYS' then
do
address SQL
"Update STCTBL set CURRENT_STATE='UNKNOWN' DESIRED_STATE='DOWN' ",
"MODE = 'ACTIVE' where NAME not in ("Exclude_List")"
end
/*--------------------------------------------------------------------*/
/* Set DESIRED_STATE = 'DOWN' for all SSM resources except JES2,NET */
/* VTAM, and TCPIP if this is a SHUTMAINT request. */
/*--+----1----+----2----+----3----+----4----+----5----+----6----+----7*/
if shutrequest = 'SHUTMAINT' then
do
address SQL
"Update STCTBL set CURRENT_STATE='UNKNOWN' DESIRED_STATE='DOWN' ",
"MODE = 'ACTIVE' WHERE NAME NOT IN ",
"("Exclude_List",'JES2','NET','TSO','TCPIP')"
end