Most of the times, the IIS Web agent fails to start and gives a 500 error when the client tries to connect.
There is no error available in the Agent log or trace, and the only trace we find is that there is an Sm_AgentApi_Init Failed error message reported in the Windows Event Viewer.
When the problem occurs, the LLAWP process attempts to start but fails.
Why this is happening, and how we could solve this?
Assuming the process of restarting the agent machine is happening very frequently, this mechanism will cause a lot of connections to be left as 'ESTABLISHED' in the Policy Server machine, even if there is no corresponding LLAWP process on the other side.
It may happen that during the frequent reboots, at one point the Web Agent tries to create a connection but it is assigned a port number that corresponds to one of these abandoned connections. So when it tries to connect, the Policy Server machine will see this as irregular, and it will not be sending a SYN/ACK packet in response to the SYN, which will cause a TCP sequence mismatch, resulting in the Web Agent machine discarding this packet as it is out of sequence and believing it is not meant to be a response to the SYN.
This is not really an error or defect, but the expected behaviour in the scenario described (very frequent reboots of the IIS agent machine). Thus it is unlikely to happen in a production environment, which should be very stable by design.
However, if there is a concern that such a situation may happen, there are several possible corrective actions available to mitigate it or eliminate it completely:
SM_ENABLE_TCP_KEEPALIVE Enable the variable with the following value: 1
For Unix, SM_ENABLE_TCP_KEEPALIVE=1, and export the variable.
For Windows, create the following system environment variable with a value of 1:
SM_ENABLE_TCP_KEEPALIVE