What causes the CAUAJM_W_00043 warning message regarding getaddrinfo?

book

Article ID: 38224

calendar_today

Updated On:

Products

CA Workload Automation AE - Business Agents (AutoSys) CA Workload Automation AE - Scheduler (AutoSys) CA Workload Automation Agent

Issue/Introduction

Question:

What causes the CAUAJM_W_00043 warning message regarding getaddrinfo?

The Scheduler log contains many messages that read "CAUAJM_W_00043 The UNIX function getaddrinfo did not respond in a timely fashion for hostname [XXXXXXXXX]. Retrying...". What is the root cause of this message and what can be done to address it?

 

Environment:

Autosys: All Releases

Operating Systems: ALL

 

Answer:

When the Scheduler or Application Server need to resolve a hostname, they do so by making a call to the getaddrinfo() UNIX function. When the system call is initiated, there is a default timeout of 15 seconds set for it to return a result. If the result is not returned within the timeout period, the CAUAJM_W_00043 message is posted to the log of the component making the call. At that point, a second call to getaddrinfo() is made. If the second attempt also times out, the machine is placed offline. The machine will be automatically placed online once communication with it is restored.

When there are many occurrences of this warning message in the Scheduler log, it is a good indication that there are intermittent communication issues between the Scheduler machine and a DNS server and should be investigated by your system administrator and/or network team.

Environments that have IP caching disabled are more likely to see these warnings since the call is made every time a hostname needs resolution.

The timeout interval is configurable using an environment variable that is read by the Scheduler/Application Server on startup. This is accomplished by setting the variable in the $AUTOUSER/autosys.sh.<hostname> file, which is read by the startup scripts for both components.

For 11.0 SP5, the variable is GAI_TIMEOUT

For 11.3.5/11.3.6, the variable is AS_RESOLVEHOST_TIMEOUT

The value set for this variable can be from 1 to 120 and is added to the default 15 second timeout. For example, the value is set to 20, the timeout period is increased from 15 to 35 seconds. 

Increasing the timeout will reduce the frequency of the warning messages. However, it does not address the underlying issue with getaddrinfo performance. That should still be investigated by the system admins/network admins. Slow performance of this function within an environment that does not have IP caching enabled will cause a significant performance hit to the instance.

Environment

Release: ATSYHA99000-11.3.6-Workload Automation AE-High Availability Option
Component: