Customer encountered a complete unresponsive situation on a particular day, from 12:54 PM UTC ( the script took 7 minutes to execute because the Access Log was reporting "Log manager: stopped due to log full" ).
access-log
edit log ArcSight
commands delete-logs
exit
edit log XXXXXXX
commands delete-logs
exit
exit
It is clearly not expected for the appliance to become completely Unresponsive with CPU going Up to 80% and the Memory going Up to 60% when we delete an access log.
Release : 7.3.11.3
Investigating the sysinfo log for the reported appliance, the reportedly logged high CPU utilization isn't seen.
Further checks show the below events, linked with "ArcSight".
SGARS: ProxySG Diagnostic
Time Count/s Message
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
xxxx xxx xx xxxx xx:xx:xx 1 Access Log (ArcSight): Log uploading failed. Remote filename: xxxxxxxxxxxxxxxxxxx size: 0 KB.
Trigger: Issues with access log uploads
If the ProxySG is configured to upload access logs to a server and there is an issue with the upload, the result could be high CPU in Misc. From what we see in the logs, the failed access logs upload caused the sudden rise in the CPU utilization.
While it isn't expected to see the CPU spike happen again, without the access log upload failures happening, should this happen again, it's recommended to check the event logs on the Edge SWG (ProxySG) to see if there is an issue with the access log uploads. Also use the test upload button in the access log configurations to see whether the upload works or not. If a problem is found, verify the configurations on the ProxySG for the access log upload.
Ref. doc.: https://knowledge.broadcom.com/external/article?legacyId=TECH242540
For the recommended access log upload client settings, please refer to the guidance in the docs below:
Note: This was not a bug but a high CPU utilization caused by the numerous failed access log uploads, which used up so much of the CPU process, thereby negatively impacting the availability of sufficient CPU resource for the processing of other requests.
Also, having read some "ArcSight" community resource, we read that ESM resources or high event throughput in ESM or poorly configured content, in ArcSight, can impact CPU usage. You may want to explore this as well.
Ensuring that the "ArcSight" is set up to prevent possible triggers for high CPU utilization, and keeping with the recommended access upload client configurations/settings on the ProxySG, would prevent this kind of high CPU utilization from happening.
Deleting the logs on the Proxy, whenever the logs get filled up, and with the log upload failures happening, as shown above, is a recommended workaround.
Where this consistently recurs, a possible R&D engagement would be required, and with relevant log data.