search cancel

Splunk not ingesting WSS HTTP logs in a fast manner

book

Article ID: 253412

calendar_today

Updated On:

Products

Web Security Service - WSS

Issue/Introduction

Splunk no longer showing regular updates of WSS HTTP requests downloaded via the SyncAPI.

Can track issue down to a specific time frame, which does not match any maintenance activities on WSS or customer side.

Splunk configured to retrieve logs from WSS every 15 minutes, but Splunk logs are up to 10 hours behind.

Resetting Splunk TA token does not help - re-starts download but still fails to catch up with almost real time logs.

WSS Splunk TA logs show many requests initiated (at non 15 minute intervals!) but many are not completed:

2022-10-26 21:38:06,225 INFO 47407993161600 - SWSS: Starting data collection...
2022-10-26 21:43:50,838 INFO 47672280385408 - SWSS: Starting data collection...
2022-10-26 21:49:42,830 INFO 47672280385408 - SWSS: Received partial data, indexed it and continuing to request remaining data
2022-10-26 21:50:12,797 INFO 47754211592064 - SWSS: Starting data collection...
2022-10-26 22:03:12,586 INFO 47531214952320 - SWSS: Starting data collection...
2022-10-26 22:11:39,264 INFO 47531214952320 - SWSS: Received partial data, indexed it and continuing to request remaining data
2022-10-26 22:19:21,313 INFO 47531214952320 - SWSS: Received partial data, indexed it and continuing to request remaining data
2022-10-26 22:44:22,892 INFO 47531214952320 - SWSS: Completed data collection
2022-10-26 22:44:23,368 INFO 47261121804160 - SWSS: Starting data collection...

Environment

Splunk WSS TA version 1.3 initially - needed to upgrade to WSS TA 2.1 to get improved memory management when processing downloads.

Cause

Network issues between Splunk host and WSS.

Resolution

Addressed local network performance issue (as a quick workaround, we were able to changed network path from Splunk to SyncAPI endpoint which gave normal speeds).

Identified slow downloads (manually via curl as well as with TA) of files that triggered a backlog of events on Splunk TA, and eventual ingestion delays.

Splunk WSS TA struggled when the downloads were not completing within the 15 minute interval - there was an option to go to 30 minutes and try and mitigate the issue, but the downloads were so slow that a download (which will grow in size with the increased interval) was not guaranteed to finish within the 30 minute interval.

Additional Information

- make sure you are running latest available SPlunk TA as a best practice (was not the case initially)

- WSS support team has access to check the SyncAPI GET requests, and hence the size of the payload returned and time taken for each request. We clearly saw high delays starting the day the issue was initially reported as shown below:

- using following curl on the Splunk host, we manually generated a request with the API username/password to check whether issue was specific to Splunk TA or not. This showed really poor performance (30kbps):

# curl "https://portal.threatpulse.com/reportpod/logs/sync?startDate=1666488115000&endDate=0&token=none" -H "X-APIUsername:aaaaaa" -H "X-APIPassword:bbbbb" -o /dev/null

- downloading the exact same file from another host on a different network shows 150 times the download speed, confirming the issue was specific to Spunk subnet

- PCAPs from Splunk side did not show huge volumes of packet drops, but poor TCP windowing and possibly some throttling.

- A failover Splunk host existed on another network and WSS collector was enabled there to download the logs and all worked fine - whilst the local network troubleshooting continued.

Attachments