The customer is trying to upgrade Autosys Agents from 11.4 to 12.0 via Agent Inventory.
Autosys server is on 12.0.
The customer was able to upgrade several agents using Agent Inventory, but there are some agents that will just go into quiesced status.
After a long period of time, it will come back with an upgrade canceled status message.
The customer verified that there are no jobs executing in that agent(s).
It seems to be very erratic as it will do some and not others.
Autosys 12.X
Agent's job database (<agent_install_folder>/database) was not in sync with the status of the jobs in Autosys.
1) Verify that there are no jobs running:
chase
autorep -m agent_name -d
2) Check if there are any processes related to agent:
ps -few|grep cybAgent | grep -v grep |awk '{print $2}'
ps -few|grep cybspawn | grep -v grep |awk '{print $2}'
# this showed a bunch of cybspawn processes (file watcher jobs)
3) Verify if the above job statuses in Autosys. If those jobs are all completed/done/gone already from AE,
4) kill the jobs above kill -9 `ps -few|grep cybspawn | grep -v grep |awk '{print $2}'`
5) Restart agent, and retest if the Agent Inventory based upgrade finishes
6) check the spool log:
cd <agent_folder>/spool/wcc_agent_cmd/<< latest temporary name..>>
cat cmd.out
Wait for job completion timed out
Timed out waiting for the running jobs to complete. Cancelling the action as per configuration preference
Inside invokeUnquiesceAction()
About to invoke unQuiesceAgent
uri.getPath():/wcc/asi/rest/agents/71
pathParam:/wcc/asi/rest/agents/71/unquiesce
url:https://WCC-HOSTNAME:8080/wcc/asi/rest/agents/71/unquiesce
conn-sun.net.www.protocol.https.DelegateHttpsURLConnection:https://vWCC-HOSTNAME:8080/wcc/asi/rest/agents/71/unquiesce
responseCode: 200
Headers: {Keep-Alive=[timeout=60], null=[HTTP/1.1 200], Server=[WCC], Access-Control-Allow-Origin=[https://WCC-HOSTNAME:8080], Access-Control-Allow-Methods=[GET, POST, PUT, DELETE, OPTIONS, HEAD], X-Content-Type-Options=[nosniff], Connection=[keep-alive], X-Content-Security-Policy=[default-src 'none'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; object-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; form-action 'self'; font-src 'self'; connect-src 'self'; plugin-types application/pdf application/x-shockwave-flash; reflected-xss block;script-nonce 19fb88ebc86e60fde66130a8fc613e75f1fbd167], P3P=[CP="CAO PSA OUR"], unquiesceSuccess=[true], Date=[Wed, 15 Sep 2021 18:24:12 GMT], Access-Control-Allow-Headers=[origin, content-type, accept, authorization], X-Frame-Options=[SAMEORIGIN], Strict-Transport-Security=[max-age=31536000], Cache-Control=[max-age=0 no-cache, no-store, must-revalidate], Access-Control-Allow-Credentials=[true], proceedToUpgrade=[false], quiesceSuccess=[false], Content-Security-Policy=[default-src 'none'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; object-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; form-action 'self'; font-src 'self'; connect-src 'self'; plugin-types application/pdf application/x-shockwave-flash; reflected-xss block;script-nonce 19fb88ebc86e60fde66130a8fc613e75f1fbd167], Expires=[Tue, 14 Sep 2021 14:24:12 EDT], Content-Length=[0], Access-Control-Max-Age=[1209600], X-XSS-Protection=[1; mode=block]}
unQuiesceSuccessStatus:true
Exiting invokeUnquiesceAction with statusCode:UPGRADE_CANCELLED actionDetails:Upgrade cancelled. Timed out waiting for the running jobs to complete after Quiesce.
Cancelled the action due to configuration setting.
About to update status with
statusCode:UPGRADE_CANCELLED
actionDetails:Upgrade cancelled. Timed out waiting for the running jobs to complete after Quiesce. Cancelled the action due to configuration setting.
7) It seems that WCC somehow thought that the agent still had running jobs.
- Do a cold start of the agent
- Stop Agent, cd <agent_folder>, rename database folder to database.old, Start agent
- Retry upgrade now and it went through fine
Potentially this happened because, the agent somehow did not have a clean start from a while back.