How to troubleshoot Jobs going into a Launch Error
search cancel

How to troubleshoot Jobs going into a Launch Error

book

Article ID: 282515

calendar_today

Updated On:

Products

CA Automic Applications Manager (AM)

Issue/Introduction

One or more Jobs are going into a Launch Error status. How can we troubleshoot Jobs going into Launch Error?

Resolution

A Launch Error can occur for any number of reasons such as, but not limited to:

  • A long running sql query holding up a Java thread within the Java process
  • Delay in reading/writing to output directory due to number of existing files or space
  • Too many Jobs running on the Agent concurrently
  • Lack of Agent resource
  • Missing or incorrect configuration or environment variables
  • Hold on startJob thread caused by errors

Depending on the root cause, the behavior of the Launch Error may be different:

  • Job goes into Launch Error right away with not output generated
  • Job goes into Launch Error after 5 minutes with no output files generated
  • Job goes into Launch Error after 5 minutes but output file may be generated a few minutes after (this indicates the Job tried to run after 5 minutes)
  • Job goes into Launch Error after x minutes and may or may not have output files generated

Because Launch Errors can occur with different behaviors and caused by different circumstances, to troubleshoot Launch Errors, it is recommended that a case with Broadcom Support is opened once the below debug information is available.

Information required to review Launch Errors:

  • Debugged RMI logs from the master covering from about 15 minutes before the first Launch occurs to 15 minutes after. If a Launch Error occurs or is reproducible right after a restart of Applications Manager and Agent, only RMI logs generated after the restart to about 15 minutes after the Launch Error is required. This 30 minute timespan can cover one or more log files.
  • Debugged Agent logs from the Agent where the Job is running against, covering from about 15 minutes before the first Launch occurs to 15 minutes after. If a Launch Error occurs or is reproducible right after a restart of Applications Manager and Agent, only Agent logs generated after the restart to about 15 minutes after the Launch Error is required. This 30 minute timespan can cover one or more log files.
  • Information on Job Name and JobID of first Launch Error Job
  • Information on time in which Job entered Backlog and time Launch Error occurred
  • Open Launch Error Job's Task Details and take a screenshot of the comments tab
  • Any output files generated. It is recommend that checking for output files is checked 3 to 5 minutes after Launch Error occurs just in case Job tries to Launch a few minutes after a Launch Error occurs.

How to enable debug:

RMI debug can be enabled by:
1. Stop all processes with stopso all command or via service
2. Edit the AW_HOME/site/awenv.ini file and add the line "debug-true" into the [default] section and save/close.
3. Start up all processes with startso all command or via service
4. Debug is now enabled and will write additional information into the RmiServer.log and RmiServer<timestamp>.log files in the AW_HOME/log directory

Agent debug should be enabled on the server end via awenv.ini file and requires a restart to the Agent process. Agent debug can be enabled by:
1. Stop Agent process or service
2. Edit the AW_HOME/site/awenv.ini file and add the line "debug-true" into the [default] section and save/close.
3. Start up Agent process or service
4. Debug is now enabled and will write additional information into the AgentService.log and AgentService<timestamp>.log files in the AW_HOME/log directory

After running a Job and allowing some time after the the Launch Error to pass, or after the some time has passed since the first Launch Error occurs since restarting Agent and RMI process, provide Support with the information listed in section "Information required to review Launch Errors"