OS Jobs fail with FAULT_OTHER and User-Service Communication Errors | U02000495 / U02000186
search cancel

OS Jobs fail with FAULT_OTHER and User-Service Communication Errors | U02000495 / U02000186

book

Article ID: 418247

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine CA Automic One Automation Automic SaaS

Issue/Introduction

Linux and UNIX Java-based Agents running versions prior to 24.4.2+hf3 may sporadically fail to start jobs with a FAULT_OTHER status. The Agent trace (tcp/ip=9) shows a sequence of errors indicating a User-Service shutdown coinciding precisely with a new job start request (USOPEN or USLOGON), leading to a java.io.IOException when attempting to open the job report file. This issue is a timing-based defect related to the User-Service's idle-timeout period.

When this issue occurs, the Agent log will contain a sequence of messages similar to the following, happening almost simultaneously (within the same millisecond):

  • Jobs fail with a FAULT_OTHER ("Start not possible. Other error.") status.
  • The agent log records the following error messages sporadically:
    • U02000495 Communication to UserService 'UserServiceDriver [user=..., pid=...]' for request 'USOPEN' failed, reason: 'null'.
      • The above may also show USLOGIN instead of USOPEN
    • U02000496 User-Service 'UserServiceDriver [user=..., pid=...]' shutdown has been initiated.
    • U02003083 User-Service for User '...' with PID: '...' ended.
    • U02000186 Report file '/path/to/report.TXT' for Job '...' cannot be opened. Error: 'java.io.IOException Failed to open file: /path/to/report.TXT'.
      • The above may also show Error: 'java.io.IOException Failed to logon child driver for user '...''

This typically affects jobs that run frequently (e.g., every 30 minutes, which matches the default idle timeout) and is non-reproducible on demand, happening only sporadically.

Environment

All versions of Agent OS Java prior to 24.4.2 HF3 where this specific issue may occur.

Cause

Defect ID: DE176564

This is caused by a timing issue within the Agent's User-Service handling. The Agent is unable to start a new I/O operation (like a job request) if the specific User-Service program has reached its idle-timeout (default 30 minutes) and is in the process of shutting down.

If a job start request (USOPEN) arrives in the same millisecond that the User-Service initiates its shutdown (U02000496), the request fails with a communication error (U02000495). The Agent logic fails to handle this shutdown scenario gracefully, meaning it does not automatically re-route the job start request to a new User-Service process, resulting in the job failing with FAULT_OTHER.

Resolution

This defect has been resolved in the following Agent versions:

  • Agent Version 24.4.2 + Hotfix 3 (24.4.2+HF3)

  • Agent Version 24.4.3

Customers should upgrade the affected Linux/UNIX Agents to version 24.4.2+HF3 or later to ensure the Agent correctly handles User-Service shutdown and starts a new User-Service process for incoming job requests.