Uprocs randomly abort stating that IO server is not reachable / address already in use
search cancel

Uprocs randomly abort stating that IO server is not reachable / address already in use

book

Article ID: 87073

calendar_today

Updated On:

Products

CA Automic Dollar Universe

Issue/Introduction

Error Message :
On the job log of the uproc these kind of messages may appear:
#################################
Cannot get id for Uproc UPROC_NAME
Invalid value for RESEXE variable
#################################

Or:
#################################
Error connecting to the IO server: o_connect_auth returns -1: (WinSock): Address already in use 

Error getting conf from IO server: o_get_conf_from_io returns -1 

Cannot load environment 

The syntax of the command is incorrect. 
#################################

On the history trace of the failed job, the return code is 541340465:
#################################
Message code : (541340465) 
20141118134700 *** TASK ENDED ABNORMALLY ***
#################################

At the same time, on the universe.log we get these errors:
#################################
|ERROR|X|cmd|pid=290048.289704| owls_connect_auth | k_connect_auth_timeout(lonv199004/SIO) returns error [200]
|ERROR|X|cmd|pid=290048.289704| o_callsrv_connect_r | Connection error 10048 [(WinSock): Address already in use]
|ERROR|X|cmd|pid=290048.289704| owls_cmd_return | Can not connect to server. Error!
#################################

Or:
#################################
|ERROR|X|LAN|pid=283432.278308| o_connect_auth | k_connect_auth_timeout returns error [20
|INFO |X|LAN|pid=283432.278308| u_get_intern_uproc_script | error connecting to [NODENAME]/[X]/[local]
################################# 

Patch level detected:Dollar Universe 6.2.00
Product Version: Dollar.Universe 6.2.0

Description :Some CMD uprocs randomly abort, for example once every 15 minutes.

A restart of the DUAS node, does not workaround the issue.

Many sockets appear on status TIME_WAIT on the netstat -na with the IO server port:
#################################
TCP 127.0.0.1:3555 127.0.0.1:10600 TIME_WAIT
TCP 127.0.0.1:3702 127.0.0.1:10600 TIME_WAIT
TCP 127.0.0.1:3725 127.0.0.1:10600 TIME_WAIT
#################################

Environment

OS: Windows

Cause

Cause type:
Other
Root Cause: System / Network issue on the Windows server, sockets wouldn't bind to the IO server.

Resolution

The problem was fixed after rebooting the Windows server, this cleaned all the sockets in TIME_WAIT and allowed to retrieve the correct ip configuration on the network cards.

Fix Status: Released

Fix Version(s):
Component: Application.Server
Version: Dollar.Universe 6.2.41

Additional Information

Workaround :
N/A