AutoSys and WCC Performance Tuning Best Practices
search cancel

AutoSys and WCC Performance Tuning Best Practices

book

Article ID: 438851

calendar_today

Updated On:

Products

Autosys Workload Automation Workload Automation Agent

Issue/Introduction

One or more of the following symptoms can be observed:

  • A performance degradation when working with the system.
  • Much slower than usual UI response times in the Web UI (WCC) or CLI.
  • Delayed job processing, lag, or throughput issues.
  • High latency during database retrieval operations.
  • Extended agent inventory evaluation times.

This article deals with the most common and successful measures to tune component performance, optimize database connections, streamline security policies, and maintain system objects to restore or improve performance.

Environment

AutoSys Workload Automation, AutoSys Web UI (WCC), CA Embedded Entitlements Manager (EEM)

Resolution

General information

AutoSys Workload Automation supports high scalability. When running on high-end computers, you can tune the Scheduler, Application Server, and Web UI to make efficient use of CPU, memory, and database connections. However, overall system performance is heavily dependent on the sheer volume of defined objects, historical data retention, and the complexity of external security evaluations.

1. Keeping Database Clean / Archive & Purge (AE & WCC)

A well-maintained database is the foundation of a performant AutoSys and WCC environment. Unnecessary historical data significantly degrades database performance, increasing query times and slowing down the Event Processor.

AutoSys (AE)

    • archive_events Utility:

      • Purpose: Moves processed events and alarms from active tables (ujo_eventujo_proc_eventujo_alarm,ujo_job_runs,ujo_extended_jobrun_info,ujo_audit_info,ujo_audit_msg) to archive tables, or deletes them.
      • Best Practice: Run archive_events daily. During peak loads, consider running it more frequently. Retain only the minimum required events in the active tables (typically 1 to 3 days, depending on event volume).
    • Database Maintenance (DBMaint):

      • Purpose: Cleans up historical job runs, audit trails, and job versions. DBMaint is the correct wrapper utility that kicks off several archive_events runs with different options (jobs, events, audit tables, etc.).
      • Best Practice:  Adjust retention periods (e.g., 30, 60, or 90 days) based on business compliance requirements. Ensure that DBMaint is configured correctly in your environment to maintain database health.
    • Database Level Maintenance:

      • Indexes and Statistics: Work with your DBA to implement scheduled jobs that rebuild fragmented indexes and update database statistics. This is exceptionally critical for high-traffic tables such as ujo_jobujo_job_runs, and ujo_event. Note that DBMaint also handles dbstatistics as part of its execution.
      • Index Rebuilding: We offer the reindex.pl script which can be used as a reference or executed to rebuild indexes. However, DBAs can also perform online index rebuilding natively within the database to avoid downtime.
      • Tablespace Monitoring: Proactively monitor tablespace usage to prevent the database from hanging due to lack of space.
      • Physical Storage / I/O Contention: To achieve optimal database performance, ensure your DBA stores the database data files and transaction log repositories on entirely different physical devices (LUNs). This prevents severe I/O contention during heavy job workloads and database purges.

WCC

    • WCC Database Maintenance:
      • Purpose: WCC stores reporting data, monitoring views, and forecast data, which can grow rapidly.
      • Best Practice: Access the WCC Configuration tab (wcc_config) and set aggressive but compliant retention periods for the WCC databases (Reporting, Monitoring, etc.).
      • Scheduling: Ensure the internal WCC database maintenance tasks run daily.
  • EEM

2. Keeping EEM Policies Up to Date & Optimized

Embedded Entitlements Manager (EEM) is critical for WCC and AE security. Poor EEM performance directly impacts UI responsiveness and command-line execution times.

  • Policy Structure (Groups vs. Users):
    • Best Practice: Assign policies to EEM Groups (or Active Directory/LDAP groups mapped into EEM) rather than individual Users. Evaluating policies against thousands of individual users causes severe CPU spikes on the EEM server and latency in WCC.
    • Nested LDAP Groups: Be extremely cautious when using deeply nested LDAP groups (Global User Groups). EEM must evaluate and unroll all nested groups to determine an individual user's exact permissions. Deeply nested structures can result in significant slowness and latency during EEM authorizations. Flatten the group structure mapped to EEM wherever possible.
    • Resolve Direct Groups:  Adjust EEM settings to resolve direct groups vs nest if possible.
    • LDAP group filtering:  Create a custom map which includes group filtering and apply that to your User Store.  See: Custom Map and Filters
  • Synchronization and Caching:
    • Best Practice: Tune the EEM SDK cache in WCC and AE. Increase the cache timeout settings if policy changes are infrequent. This reduces the number of round-trips from WCC/AE to the EEM server for repeated authorization requests.
  • Maintenance & Backup:
    • Best Practice: Regularly back up EEM policies using the safex utility.
    • Cleanup: Periodically review and remove orphaned or obsolete policies. Over time, accumulated unused policies slow down the authorization engine.

3. Proactive Performance Analysis (autoaggr)

Before diving into configuration changes, it is crucial to establish a performance baseline and identify actual bottlenecks.

  • Aggregate Statistics:
    • Best Practice: Run the autoaggr utility regularly to generate aggregate statistical data from the ujo_job_runs and scheduler tables.  Aggregatestatistics must be enable to allow these statistics
    • Identifying Symptoms: Analyzing these statistics helps you understand job run patterns, scheduler throughput, and event processing performance. This aggregated data is invaluable for proactively identifying symptoms of latency, architectural bottlenecks, or degraded database performance before they cause critical outages. Use this data to guide your tuning efforts in the following sections.
  • Understanding Performance Metrics:
    • When analyzing AutoSys performance (via autoaggr or autorep), it is essential to understand the key latency metrics:
      • Average Lag Time: The average internal processing time required for events. This is calculated from when the scheduler begins processing the event until completion (excluding the time it takes to fetch the event from the database).
      • Average Latency (Horizontal Latency): The average time an event waits in the database before being picked up and processed by the scheduler.
      • Vertical Latency: The time difference between a job's STARTING event and its subsequent RUNNING event.
      • Job Latency: The time difference between a STARTJOB event and the first STARTING event processed.

4. Threading, Memory, and Database Connection Tuning

Optimizing resource allocation allows AE and WCC to handle higher workloads concurrently.

AutoSys (AE)

    • Database Connections (DB_CONNECTIONS):
      • Best Practice: The default database connections setting is often too low for enterprise environments. This is configured via the DB_CONNECTIONS environment variable in the $AUTOUSER/autosys* profile scripts (e.g., autosys.sh.$HOSTNAME). Monitor database wait times. If the Schedulers or Application Servers are waiting on DB connections, bump these values highter. Ensure the backend database (Oracle/SQL Server) is configured to handle the maximum sum of connections from all AE instances.
    • Threading (SCHED_SCALE):
      • Best Practice: Adjust the maximum thread count for the Scheduler using the SCHED_SCALE environment variable in the $AUTOUSER/autosys* profile scripts. A higher thread count allows parallel processing of events, but setting it too high can cause context-switching overhead. Tune this based on CPU core availability and event throughput. For more details see Scheduler Settings and Application Server Settings

WCC (Tomcat)

    • Tomcat Connector Tuning (server.xml):
      • maxThreads: Increase the default maxThreads (often 200) to 400 or 800 if you have many concurrent WCC users experiencing UI hangs.
      • acceptCount: Increase acceptCount to queue more incoming requests when all threads are busy, preventing immediate "Connection Refused" errors.
      • connectionTimeout: Set a reasonable connection timeout (e.g., 20000ms) to drop stale connections.
    • Data Source Connection Pools:
      • Best Practice: In the WCC configuration, tune the database connection pool parameters (maxActivemaxIdlemaxWait). Ensure maxActive is high enough to support the Tomcat maxThreads during peak reporting or monitoring usage.

5. Java Tuning (WCC and AE Web Services)

Proper JVM tuning prevents "Out of Memory" errors and reduces UI pauses caused by Garbage Collection.

  • Memory Allocation (wrapper.conf):
    • Best Practice: For WCC and AE Web Services, adjust wrapper.java.initmemory (-Xms) and wrapper.java.maxmemory (-Xmx). Set them to the same value (e.g., 4096M or 8192M) to prevent the JVM from constantly resizing the heap. Ensure the host OS has sufficient physical RAM to avoid swapping.
  • Garbage Collection (GC):
    • Best Practice: Use the G1 Garbage Collector by adding -XX:+UseG1GC to the wrapper configuration. G1GC is optimized for large heaps and minimizes application pause times, making the WCC UI feel much more responsive.
  • Diagnostics:
    • Best Practice: Add -XX:+HeapDumpOnOutOfMemoryError and -XX:HeapDumpPath=<path> to capture memory dumps for root cause analysis if a crash occurs.

6. Old Machine, Job, and Calendar Cleanup

Stale configuration data acts as dead weight in the system.

  • Jobs :
    • Best Practice: Institute a quarterly review to identify jobs that haven't run in > 6 months. Decommission and delete them. AutoSys caches job definitions; fewer jobs mean a smaller memory footprint and faster cache loading times.
  • Machines :
    • Best Practice: Remove retired or decommissioned agents from the database. If a machine is offline but still defined, the Scheduler and Application Server will periodically waste CPU cycles and network timeouts trying to communicate with it.
  • Calendars :
    • Best Practice: Identify and delete unused standard and extended calendars. Complex, heavily populated extended calendars that are no longer referenced unnecessarily bloat the database and impact scheduling performance.

7. Object Definitions and Data Backups

Regular backups of your scheduling configuration and audit data ensure you can quickly recover from accidental modifications without needing a full database restore.

  • Job, Machine, and Calendar Definitions:
    • Best Practice: Run daily automated exports of your definitions using commands like autorep -J ALL -q > jobs.jil and autorep -M ALL -q > machines.jil (as well as calendar exports). Storing these JIL dumps in a version control system (e.g., Git) provides a reliable, trackable history of your environment's configuration.
  • Autotrack Data:
    • Best Practice: Ensure your autotrack audit data is being backed up regularly. This data is vital for compliance and troubleshooting (tracking who changed what and when), and should be preserved even if older records are purged from the live database.
  • EEM Policies and configuratio: Backup the EEM policies for each application via the EEM UI or safex.  Also backup the $EIAM_HOME/config/server/server.xml file as it contains your User Store bind details and custom mapping.
  • WCC configuration settings, Users, Server and Views:  Leverage the  $CA_WCC_INSTALL_LOCATION/bin/wcc_config.sh and $CA_WCC_INSTALL_LOCATION/bin/wcc_monitor.sh to backup WCC's details,  For more details see: wcc_config and wcc_monitor

 

8. Additional Crucial Items

  • Log Rotation & Pruning:
    • Best Practice: Implement aggressive log rotation for AE logs (scheduler and Application Server) and WCC Tomcat logs (catalina.outwcc.log). Unchecked log growth can exhaust disk space, causing hard crashes. Use OS-level logrotate or application settings to keep 7-14 days of logs and archive/compress the rest.

9. Agent Performance Tuning

Agents with high job volumes must be tuned so they do not become bottlenecks in the scheduling lifecycle.

  • JVM Memory Allocation (oscomponent.jvm.x.options):
    • Best Practice: The AutoSys System Agent runs on a Java Virtual Machine. If the agent handles massive workloads or utilizes heavy Java plugins (like FTP, Web Services, or databases), it may run out of memory. Tune the heap limits by configuring oscomponent.jvm.x.options=-Xmx1024m;-Xms256m in agentparm.txt to prevent out-of-memory errors and excessive Garbage Collection pauses.
  • Spool File Management (Cleanup):
    • Best Practice: Every job execution creates spool files in the agent's spool directory. Over time, millions of old spool files will cause severe file system I/O contention, slowing down the agent's ability to scan for job outputs. Configure the automated spool cleanup in agentparm.txt by setting:
      • oscomponent.spool.clean.enable=true
      • oscomponent.spool.clean.expire=7D (to clean files older than 7 days)
      • oscomponent.spool.clean.sleep=1D (to run the cleanup cycle once a day)
    • Alternatively, you can schedule the provided ClearSpool utility as a recurring job.
  • Receiver/Transmitter Tuning:
    • Best Practice: The agent's receiver and transmitter threads are responsible for receiving data and sending status events back to the AutoSys manager. On heavily loaded agents, the default queue sizes and buffer limits might become a bottleneck, leading to delayed job status updates. Ensure the network between the agent and manager is stable, and if throughput issues persist, consult the agent documentation to tune the agent properties in agentparm.txt. In addition, couple of key ones are documented here.

10. Network Latency & Architecture

Network lag or latency between the core components is the silent killer of AutoSys performance. Even minor delays can severely impact job throughput when multiplied across millions of transactions.

  • AE to Database Latency:
    • Best Practice: Ensure the network ping latency between the AutoSys Application Server/Scheduler and the Database server is responsive. High latency here directly causes event processing lag and delays in job state changes.
  • WCC to AE / WCC to Database Latency:
    • Best Practice: WCC components make numerous calls to both the AE Web Services and the backend databases. Network proximity between WCC, AE, and the databases is crucial to avoid slow UI loading times and command timeouts.
  • Agent Latency:
    • Best Practice: While agents can tolerate higher latency than core servers, excessive lag between the Scheduler and remote Agents can cause dispatch timeouts and delayed job start times. Monitor network stability to remote data centers.
  • EEM-LDAP Latency:
    • Best Practice: Excessive latency between the EEM Server and LDAP server from which authentication/authorizations are being checked for User Group resolutions could lead to slow down in operations of the UI. It is recommended to have EEM Server and underlying LDAP Servers in a smaller network latency setup.

Additional Information

Documentation References