Reacting to ENDED_LOST status

book

Article ID: 225226

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine

Issue/Introduction

ENDED_LOST is a status that can come up on tasks when the Automation Engine does not know that the Agent is going to go down while the agent is still in the middle of running OS jobs.  There can be a number of reasons for this such as network issues, server outages, and more.  

This article will show a couple of ways these can be reacted to.  

These are not by any means the definitive methods; if help is required to put these in place, please reach out to your account team or take a look at https://expert.broadcom.com/ to find someone who can help implement a solution.  Another recommendation is to reach out to the Broadcom community to see how other customers implement monitoring and workarounds for these situations.

Environment

Release : 12.3

Component :

Resolution

The behavior for a job to go to ENDED_LOST - ended undefined if an agent stops unexpectedly without the automation engine noticing is as designed.  If a job in a workflow when this happens, it could cause downstream issues on the JOBP.  The reactions of the following jobs depend on how their Dependencies are set up.  If there is nothing in the second task's dependencies on the first task, the second task will run no matter what:

Usually there is something like ENDED_OK or ANY_OK or something that doesn't include ENDED_UNDEFINED:

Here is one way to respond to this situation in case it is known that a specific task's agent might end lost and the job end lost; this can be added on a case by case basis and requires some understanding of Post Conditions.  There's not an easy way to add this to all workflows.

The status for ENDED_LOST - ended undefined is 1815 and in conditions is considered "ENDED UNDEFINED".  This means the following is possible:
In a workflow, on a task that might end with ENDED_LOST (you may be able to predict which tasks these are if they do this often), go to Post Conditions
Add a condition with CHECK STATUS:
    IF task ended with status ENDED_UNDEFINED
        FINALLY restart task in x minutes (determine how long it usually takes for the agent to be back up and running
    ELSE
        Take some action here - you can always take a "dummy" action of set variable dummy# to n/a

This way task 1 can be restarted after the amount of time you believe the agent will take to be back up and running if it ends lost.  If it does not end lost, then nothing is done.

 

There is also a way to kick off a notification specifically for JOBS ending in ENDED_LOST within a client using an SLO object.  Here is one example:

1) Create a Service Level Object (SLO) object
2) In the "Service Level Object" tab, be sure that "Monitor this Service Level Objective" is checked (and likely choose "Permanently")
3) On the "Service Selection" tab, add a criterion:
     Object Type
     equals
     JOBS
4) In the Fulfillment Criteria tab, check "End with specific end status" and choose "ENDED_UNDEFINED"
     Under "Actions" in the same tab, check "on fulfillment" and put in the notification object that should be sent out

This will always run that notification whenever any job in the client ends ENDED_LOST.

 

Additional Information

More information on SLO objects can be found here: https://docs.automic.com/documentation/webhelp/english/ALL/components/DOCU/12.3/Automic%20Automation%20Guides/help.htm#AWA/Objects/obj_SLO.htm
More information on Post Conditions can be found here: https://docs.automic.com/documentation/webhelp/english/ALL/components/DOCU/12.3/Automic%20Automation%20Guides/help.htm#AWA/Workflows/wf_PropertiesPane_CondTab.htm