users report intermittent "Could not connect to machine" errors and CHASE alarms.
search cancel

users report intermittent "Could not connect to machine" errors and CHASE alarms.

book

Article ID: 440911

calendar_today

Updated On:

Products

Workload Automation Agent

Issue/Introduction

Users report intermittent "Could not connect to machine" errors and CHASE alarms on the jobs.

  • Error Message: <Could not connect to machine: [HOSTNAME] The machine or the network must be down.>
  • Alarm: CAUAJM_I_40245 EVENT: ALARM ALARM: CHASE
  • Observed Behavior: The agent appears to freeze for extended periods (e.g., 10–40 minutes), during which time it stops responding to the manager and fails to write to simple_health_monitor.log at the standard 1-minute intervals.

Cause

Memory Thrashing: In simple_health_monitor.log, the heap memory shows a massive reclamation after the freeze (e.g., clearing >2GB).

 

04/29/2026 21:58:23.245-0400 5 main.SimpleHealthMonitor.CybSimpleHealthMonitor.run[:286] - Agent health information
                                                                                           ------------------------
                                                                                           Maximum allocateable: 27305 MB    0 KB    0 B
                                                                                           Total allocated heap: 3449 MB    0 KB    0 B
                                                                                           Currently free heap:  1377 MB  758 KB  720 B
..

..

04/29/2026 22:42:59.758-0400 5 main.SimpleHealthMonitor.CybSimpleHealthMonitor.run[:286] - Agent health information
                                                                                           ------------------------
                                                                                           Maximum allocateable: 27305 MB    0 KB    0 B
                                                                                           Total allocated heap: 3449 MB  512 KB    0 B
                                                                                           Currently free heap:  3404 MB  409 KB  520 B


OS Configuration: Incorrect kernel swappiness settings cause the host OS to aggressively swap the Java heap to disk. When GC (Java Garbage Collection) attempts to scan the heap, it triggers heavy I/O waits, causing the 1-minute cleanup to take several minutes.

Resolution

Work with Unix/Linux admin to change host OS kernel parameters and tune the agent's JVM memory footprint.

1. OS Kernel Tuning (Linux)

Adjust the kernel parameters to prevent aggressive swapping of the agent's memory.

    • Incorrect Setup: vm.force_cgroup_v2_swappiness = 0 vm.swappiness = 10
    • Correct Setup: vm.force_cgroup_v2_swappiness = 1 vm.swappiness = 1

2. Agent JVM Tuning

Limit the JVM heap size in agentparm.txt to ensure it fits within physical RAM and reduces the scan time during GC. Example, to configure Agent with 2GB memory, add the following parameter (Agent restart is needed for the change to be effective):

oscomponent.jvm.x.options=-Xmx2048m;-Xms512m