100% CPU Utilization & about 100% Memory Pressure, and web access is completely down.
The CAS appliance was reported to have reached about 100% in CPU utilization and in memory pressure, and following this, web requests were being blocked, since they could no longer be processed by the CAS appliance. As we checked, we saw the below.
It was also confirmed that the deployed model was CAS-VA-C4, and it was also known that the model was moved up to CAS-VA-C6, after the challenge started. For this model, note that the maximum number of concurrent ICAP connections that should be sent to the CAS should be 100 and not the current 250. Following this, it's confirmed that the appliance was clearly overloaded.
Ref.: https://knowledge.broadcom.com/external/article/168737/recommended-icap-connections-on-proxysg.html
Also, and as shown in the earlier snippets, the connections are stuck in the "Reading" state and never get scanned. Examining the connections, it can be see that they are largely LinkedIn connections and streams. Following this, and from the ICAP Best Practice guidance, we shouldn't be scanning traffic from social media sites and streams, and the likes. Please see excerpt from the ICAP Best Practice template below, which clearly recommends NO ICAP, for these kinds of connections.
; Web Application by Name
define condition Web_Apps_No_ICAP_Level_Basic
request.application.name="Microsoft Update"
request.application.name="Symantec Live Update"
request.application.name="Apple Update"
; Further Examples
request.application.name="YouTube"
request.application.name="Vimeo"
request.application.name="Facebook"
end condition Web_Apps_No_ICAP_Level_Basic
Ref.: https://techdocs.broadcom.com/content/dam/broadcom/techdocs/symantec-security-software/web-and-network-security/proxysg/common/SG_CA_ICAP_Best_Practice_CPL_v1.5_IS_Advanced.txt
From the way CAS is designed to work, it must first read the entire data input, before it will scan the data. So, being stuck in the Reading state, and without deferred scanning implemented, every other connection that's behind would have to wait and that would mean that the web page never loads and may return ICAP errors.
So, to fix this one, in the short term, it's recommended to exempt these social media and stream connections from scanning, using the ICAP Scanning exemption policy. Please refer to the Tech. Doc. with the URL below, for the implementation guidance. Then, you should restart the CAS appliance, to have the existing queue cleared.
You may also disable the Web Content rule, to stop ICAP scanning, to take out any criticality.
In the longer term, it's strongly recommend to implement the ICAP Best Practice, in its totality. For the requisite guidance, please refer to the Tech. Docs. with the URLs below.
Please note that the ICAP Best Practice isn't a cast-in-stone. A successful implementation of the ICAP Best Practice is one that takes into account the entire http connection that reaches the CAS, per time, and one that is regularly finetuned, to reach an optimized state. It requires an academic approach, to really succeed.