search cancel

STOP Process triggered even though TASK is DOWN_DOWN

book

Article ID: 230426

calendar_today

Updated On:

Products

OPS/MVS Event Management & Automation

Issue/Introduction

Problem

On Nov 26 during SSM SHUTSYS, the task WPAGNTZ was marked DOWN_DOWN when the task ended.

However the task still ran the STOP process for the WPAGNTZ task and caused a TIMEOUT situation.

I believe it is a TIMING issue that SSM did not recognize in time that the task was actually down.

 

On Dec 3 during SSM SHUTSYS, it worked fine.

 

 

Here is some condensed information to assist in explaining what happened on Nov 26.

 

26NOV 18:00:11 ssm shutsys sysz                                                                                                      

 

26NOV 18:00:34 OPS7914O SSM AUDIT: STCTBL.WPAGNTZ UPDATED by OPSOSF SHUTSYS2 CURRENT_STATE=UNKNOWN                                

26NOV 18:00:34 OPS7914O SSM AUDIT: STCTBL.WPAGNTZ UPDATED by OPSOSF SHUTSYS2 DESIRED_STATE=DOWN                                   

26NOV 18:00:34 OPS7914O SSM AUDIT: STCTBL.WPAGNTZ UPDATED by OPSOSF SHUTSYS2 MODE=ACTIVE                                          

26NOV 18:00:42 OPS7902H STATEMAN ACTION FOR STCTBL.WPAGNTZ: UNKNOWN RULE=SSMSTATE TABLE(STCTBL) NAME(WPAGNTZ) TYPE(WPAGNTZ_SYSZ) JO

26NOV 18:00:42 OPS7914O SSM AUDIT: STCTBL.WPAGNTZ UPDATED by OPSMAIN STATESET CURRENT_STATE=UP                                    

WPAGNTZ CURENT_STATE=UP and DESIRED_STATE=DOWN.

 

26NOV 18:01:02 STC WPAGNTZ

26NOV 18:01:02 OPS7914O SSM AUDIT: STCTBL.WPAGNTZ UPDATED by *MASTER* *DYNAMIC.ESSMEZ CURRENT_STATE=DOWN                          

WPAGNTZ came out here. CURRENT_STATE and DESIRED_STATE match - DOWN_DOWN.

The task was brought down externally by non-SSM task.

 

26NOV 18:01:03 OPS7902H STATEMAN ACTION FOR STCTBL.WPAGNTZ: UP_DOWN RULE=SSMCMDS STCTBL.WPAGNTZ DOWN {STOP WPAGNTZ} 120 {CANCEL WPA

This should not have executed since CURRENT_STATE was actually DOWN instead of UP.  SSM Engine "thinks" it is still UP per the message above.

Environment

Release : 13.5

Component : OPS/MVS

Resolution

There will always be a timing issue for resources that are being shutdown outside of SSM control when in active mode. They need to set the mode to passive so the SSM action does not fire in these cases. 

Further recommend updating our sample SSMSHUT2 code to not set the CURRENT_STATE to UNKNOWN when setting DESIRED_STATE to DOWN. It only adds overhead to the shutdown and masks issues where the CURRENT_STATE it not set correctly. 

/*--------------------------------------------------------------------*/
/* Set DESIRED_STATE = 'DOWN' if this is a SHUTSYS request.           */
/*--+----1----+----2----+----3----+----4----+----5----+----6----+----7*/
if shutrequest = 'SHUTSYS' then                                         
  do                                                                    
    address SQL                                                         
    "Update STCTBL set CURRENT_STATE='UNKNOWN' DESIRED_STATE='DOWN' ",  
              "MODE = 'ACTIVE' where NAME not in ("Exclude_List")"      
  end                                                                   
                                                                        
/*--------------------------------------------------------------------*/
/* Set DESIRED_STATE = 'DOWN' for all SSM resources except JES2,NET   */
/* VTAM, and TCPIP if this is a SHUTMAINT request.                    */
/*--+----1----+----2----+----3----+----4----+----5----+----6----+----7*/
if shutrequest = 'SHUTMAINT' then                                       
  do                                                                    
    address SQL                                                         
    "Update STCTBL set CURRENT_STATE='UNKNOWN' DESIRED_STATE='DOWN' ",  
     "MODE = 'ACTIVE' WHERE NAME NOT IN ",                              
     "("Exclude_List",'JES2','NET','TSO','TCPIP')"                      
  end