Why do agents get stuck in "Locked by Roam" or "Locked by RAC " or " Locked by Move" ?
Client Automation - All Versions
Lets discuss each of the Lock Types:
It is the short form of for 'Recover After Crash'. This is supposed to happen whenever a machine is re imaged to allow automation of Software to be redeployed to an agent machine to restore it in it's previous install state.
Reasons you might see Lock by Rac
This can technically happen any time the 'HostUUID' in the registry changes. Normally this should not change on its own and is recording in the following registry location:
The Value is also called HostUUID.
You'll note there are DiskSerialNumber,MACAddress and SystemID. These are the items the HostUUID is partial based on. If 2 of these three items change at any point a new HostUUID will be generated at CAF start.
Also, if 1 of the items are missing and 1 changes it will generate a new HostUUID on CAF start. If 2 are missing a new HostUUID will generate every time CAF starts.
This can be prevented completely by adding a String value called 'LockHostUUID' and setting it equal to '1'. That will keep the HostUUID from changing at all and is commonly used where problems exist.
What you can do about LOCK BY RAC ?
If unsealed job containers are generated, you can delete them and RAC will become unlock. If this is not the case, then machines need to be unlocked via a SQL query. you can unlock all machines locked by RAC via the following query:
use mdb
update ca_agent set derived_status_sd=0
where derived_status_sd =1 and agent_type=1
use mdb
Update usd_target set locks=locks-25
where locks >24 and (locks % 2 =1)
This is a lock state where the agent reports that it was previously reporting to a different Domain Manager and causes the current Domain Manager to lock it while it tries to query the previous domain manager for any info about this agent as well as installation records from the previous server before allowing new installs to be pushed.
Reasons you might see Lock by Move
This can also happen if a machine had reported into a DNS 'alias' and is later pointed to a 'real' name or the reverse. It can also happen when a machine goes from pointing to an IP address to a FQDN if the IP did not reverse resolve to the expected name, thus making the DM think it previously pointed elsewhere.
This can also happen if a DNS issue occurs and causes a name to resolve to an IP that does not REVERSE lookup back to the same name again. e.g. if your Server is named MyServer.Mycorp.Com and this is the name an agent points
too, under the covers ITCM will ping that name, get the IP address and then perform a DNS lookup to validate the Name resolution. If the IP resolve to name returns a different value, then an unintended MOVE can be triggered.
Of course actually redirecting an agent to an SS that belongs to a different Domain Manager, for instance perhaps a test domain, would accomplish the same.
What you can do about Lock by Move ?
In this case you can just select all machines locked by MOVE in All Computers in DSM Explorer, right-click on them and select 'Abort Pending Move Operation'.
This occurs when an agent changes which Scalability Server it reports to while at least one active job container exists utilizing the 'previous' Scalability Server. This lock will remain until the job data 'moves' to the current SS so the active job can resume. No new jobs are allowed until this is done.
Reasons you might see Lock by Roam
This can happen for exactly the same reasons as MOVE except the change is interpreted as a move to another SS on the SAME DM rather than a different one.
What you can do about lock by ROAM ?
This also need to be addressed with a SQL query:
use mdb
update ca_agent set derived_status_sd=0
where derived_status_sd =4 and agent_type=1
update usd_target set locks=0 where locks=4
These can be run as often as needed. In fact if you don't feel these items are ever valid, you can create an Engine SQL Script task with just the following to unlock RAC and ROAM whenever they occur:
use mdb
update ca_agent set derived_status_sd=0 where derived_status_sd =1 and agent_type=1
Update usd_target set locks=locks-25 where locks >24 and (locks % 2 =1)
update ca_agent set derived_status_sd=0 where derived_status_sd =4 and agent_type=1
update usd_target set locks=0 where locks=4
Here is a list of the base values for the various types of agent statuses in this area:
LOCKED_BY_RAC 0x00000001
LOCKED_BY_MOVE 0x00000002
LOCKED_BY_ROAM 0x00000004
LOCKED_BY_WAIT_FOR_RAC 0x00000008
LOCKED_BY_PENDING_RAC 0x00000010
LOCKED_BY_EVAL_CREATE 0x00000020
LOCKED_BY_EVAL_UPDATE 0x00000040
LOCKED_BY_MIGRATION 0x00000080
LOCKED_BY_DELAYED_CREATE 0x00000100
As seen here, even though lock by RAC is commonly seen as a value of '25' it is really made up of several sub-statuses that add up to 25 (LOCKED_BY_EVAL_CREATE, LOCKED_BY_ROAM, LOCKED_BY_RAC) this is just the nature of how the inner workings are. You'll never actually see LOCKS set to '1' for that reason.