The Edge SWG (formerly ProxySG) performing poorly with high CPU utilization and "TCP in Livelock" messages in the Event Log.
Livelocks will happen when an interface becomes so saturated with packets that the Edge SWG is unable to keep up. For the interface to become saturated, it takes more than a high volume of legitimate connections, it usually involves a network loop. For example, if a policy forwards traffic from proxy 'A' to proxy 'B', and proxy 'B' is configured to forward to proxy 'A', that will generate a loop that will most likely cause one of the interfaces to go in livelock mode until traffic quiets down.
A routing loop can also be the cause of a livelock. This is far more likely to happen when the proxy is deployed transparently inline on the network. If the proxy is installed between two redundant switches and spanning tree is disabled, it could create a network loop. Other possible causes could be Denial-of-Service attacks (ping floods for example).
To best way to troubleshoot a livelock issue is to take a packet capture and look for symptoms. Here are a few common symptoms
If you have a deployment where a child Edge SWG is forwarding all requests to a parent Edge SWG then you may be encountering a forwarding loop.
The child ProxySG may have policy similar to the following:
forward(parent_proxy) forward.fail_open(no)
This means all requests will be forwarded to the parent ProxySG.
You may encounter a forwarding loop if a parent ProxySG (or upstream client) sends a request to the child ProxySG because it will be immediately forwarded to the parent, which sends it back to the child, which is sent back to the parent and the process repeats. The symptoms are extreme slowness, high CPU and possibly TCP LiveLock messages in the Event Log.
Policy suggestions to prevent this: