Unable to switch PWP to the second node
search cancel

Unable to switch PWP to the second node

book

Article ID: 237033

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine CA Automic One Automation

Issue/Introduction

Within AWI we try to switch the PWP from the first node to a WP on the second node.
No error message is displayed, just after about 20-30 seconds the message "Server mode successfully changed" is displayed, but the PWP is not actually switched and remains on the first node.

If the PWP on the first node is switched to another WP on the first node, this completes without a problem.

Environment

Release : 12.3.4

 

Cause

Network issues were causing communication issues between the two AE servers.

The logs show hundreds of the following

U00003413 Socket call 'recv(47)' returned error code '104'
U00003413 Socket call 'recv(115)' returned error code '110'
U00003413 Socket call 'send(1)' returned error code '32'
U00003413 Socket call 'bind' returned error code '98'

At the time the switch is being attempted to the second node, trace logs show the following:

 mqsrv_get_primary(2267): rslt = 0,msqh = 340078
 mqsrv_get_primary <-- (no primary)
 try2be_pwp(7559): an older PWP found, let's bind PWP port(s)
 bind_primary_ports() -->
 bind_primary_ports(2407): retry(cnt=3,wait=30)
 U00003413 Socket call 'bind' returned error code '98'.
          Address already in use
 U00003487 ListenSocket with port number '49502' could not be created.
 U00003413 Socket call 'bind' returned error code '98'.
          Address already in use
 U00003487 ListenSocket with port number '49502' could not be created.
 U00003413 Socket call 'bind' returned error code '98'.
          Address already in use
 U00003487 ListenSocket with port number '49502' could not be created.
 bind_primary_ports <-- (PWP port couldn't be binded)
 try2be_pwp <-- (not able to bind PWP port)

Resolution

Full shut down of AE on both nodes - for Windows systems reboot the servers. 
Clean out the logs in the /temp folder and restart AE one node at a time.

Watch the logs for U00003413 messages and engage your network team if these persist.

Additional Information

Socket errors do not come from Automic, these are OS level/Network error messages being reported by the OS to Automic.