When a job using a hostgroup fails and is restarted, the hostgroups seems to be ignored on the retart as it will always choose the same agent.
Investigation
A. Add Job that has agent group to workflow
B. Make sure job is configured to fail
C. Execute workflow and allow task to block
D. Stop agent that job used while job is in blocked status
E. Right click on job in workflow monitor and restart
The job is then stuck waiting for host even though there's another active host in the hostgroup
It is by design that the restart of a job will get executed with the same agent assignment it it had on the initial run.
This is by design. Please see the documentation Agent Groups (HOSTG) which states "Restarted tasks run on the Agent that was originally selected."
There is a way around this which is to assign an agent from an agent group using pre-process scripting instead of a hostgroup on the job's attributes. You can use scripting like the following:
:set &hnd# = prep_process_agentgroup(HOSTGROUP_NAME, , 'ALL', )
:process &hnd#
: set &ag# = get_process_line(&hnd#, 1)
: set &alive# = get_process_line(&hnd#, 2)
: p ag: &ag# - active: &alive#
: if &alive# = 'Y'
: put_att HOST = &ag#
: endif
:endprocess