Agent Upgrade via Agent Inventory is failing to upgrade agents
search cancel

Agent Upgrade via Agent Inventory is failing to upgrade agents

book

Article ID: 223984

calendar_today

Updated On:

Products

Autosys Workload Automation

Issue/Introduction

The customer is trying to upgrade Autosys Agents from 11.4 to 12.0 via Agent Inventory. 
Autosys server is on 12.0.  

The customer was able to upgrade several agents using Agent Inventory, but there are some agents that will just go into quiesced status. 
After a long period of time, it will come back with an upgrade canceled status message.  

The customer verified that there are no jobs executing in that agent(s).

It seems to be very erratic as it will do some and not others. 

Environment

Autosys 12.X

 

Cause

Agent's job database  (<agent_install_folder>/database) was not in sync with the status of the jobs in Autosys.

1) Verify that there are no jobs running:
 chase
 autorep -m agent_name -d 


2) Check if there are any processes related to agent:
 ps -few|grep cybAgent | grep -v grep |awk '{print $2}'
 ps -few|grep cybspawn | grep -v grep |awk '{print $2}'
 
 # this showed a bunch of cybspawn processes (file watcher jobs)

3) Verify if the above job statuses in Autosys. If those jobs are all completed/done/gone already from AE, 

4) kill the jobs above  kill -9 `ps -few|grep cybspawn | grep -v grep |awk '{print $2}'`

5) Restart agent, and retest if the Agent Inventory based upgrade finishes

6) check the spool log: 
 cd <agent_folder>/spool/wcc_agent_cmd/<< latest temporary name..>>
 cat cmd.out

 Wait for job completion timed out
 Timed out waiting for the running jobs to complete. Cancelling the action as per configuration preference
 Inside invokeUnquiesceAction()
 About to invoke unQuiesceAgent
 uri.getPath():/wcc/asi/rest/agents/71
 pathParam:/wcc/asi/rest/agents/71/unquiesce
 url:https://WCC-HOSTNAME:8080/wcc/asi/rest/agents/71/unquiesce
 conn-sun.net.www.protocol.https.DelegateHttpsURLConnection:https://vWCC-HOSTNAME:8080/wcc/asi/rest/agents/71/unquiesce
 responseCode: 200
 Headers: {Keep-Alive=[timeout=60], null=[HTTP/1.1 200], Server=[WCC], Access-Control-Allow-Origin=[https://WCC-HOSTNAME:8080], Access-Control-Allow-Methods=[GET, POST, PUT, DELETE, OPTIONS, HEAD], X-Content-Type-Options=[nosniff], Connection=[keep-alive], X-Content-Security-Policy=[default-src 'none'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; object-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; form-action 'self'; font-src 'self'; connect-src 'self'; plugin-types application/pdf application/x-shockwave-flash; reflected-xss block;script-nonce 19fb88ebc86e60fde66130a8fc613e75f1fbd167], P3P=[CP="CAO PSA OUR"], unquiesceSuccess=[true], Date=[Wed, 15 Sep 2021 18:24:12 GMT], Access-Control-Allow-Headers=[origin, content-type, accept, authorization], X-Frame-Options=[SAMEORIGIN], Strict-Transport-Security=[max-age=31536000], Cache-Control=[max-age=0 no-cache, no-store, must-revalidate], Access-Control-Allow-Credentials=[true], proceedToUpgrade=[false], quiesceSuccess=[false], Content-Security-Policy=[default-src 'none'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; object-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; form-action 'self'; font-src 'self'; connect-src 'self'; plugin-types application/pdf application/x-shockwave-flash; reflected-xss block;script-nonce 19fb88ebc86e60fde66130a8fc613e75f1fbd167], Expires=[Tue, 14 Sep 2021 14:24:12 EDT], Content-Length=[0], Access-Control-Max-Age=[1209600], X-XSS-Protection=[1; mode=block]}
 unQuiesceSuccessStatus:true
 Exiting invokeUnquiesceAction with statusCode:UPGRADE_CANCELLED actionDetails:Upgrade cancelled. Timed out waiting for the running jobs to complete after Quiesce.
Cancelled the action due to configuration setting.
 About to update status with 
  statusCode:UPGRADE_CANCELLED
  actionDetails:Upgrade cancelled. Timed out waiting for the running jobs to complete after Quiesce. Cancelled the action due to configuration setting.
 
7) It seems that  WCC somehow thought that the agent still had running jobs. 

 

Resolution

- Do a cold start of the agent
- Stop Agent,   cd <agent_folder>,    rename database folder  to database.old,  Start agent
- Retry upgrade now and it went through fine


Potentially this happened because, the agent somehow did not have a clean start from a while back.