In environments with a high number of workloads and high flow rate, streaming NTA detectors need higher resources to effectively perform detection
SSP 5.0
In environments where:
1. SSP is monitoring a high number of workloads (observed at ~36.000 workloads)
2. The flow rate is high (observed at ~ 1M flows/5 minutes)
3. All the NTA streaming detectors are enabled — these are:
(a) Destination IP Profiler
(b) Domain Generation Algorithm
(c) DNS Tunneling
(d) Netflow Beaconing
(e) Port Profiler
(f) Server Port Profiler
(g) Unusual Network Traffic Pattern
An Alarm might be raised as follows:
Feature: Security Services Platform Event Type: Service Down Entity: llanta-detectors Description: Platform Service llanta-detectors is degraded |
The issue is due to the high number of flows processed by the NTA streaming component, in conjunction with a high number of workloads generating the flows. It is only observed on scaled-out environments: typically only with 10 Worker Nodes.
The workaround for this issue is to increase the memory allocated to the llanta-detectors-0 pod by running the following command from the SSPi instance after accessing it via root :
LLANTA_SERVICE_LIMIT=7Gi LLANTA_JOB_LIMIT=5Gi LLANTA_WORKER_LIMIT=3Gi && k patch statefulset llanta-detectors -p="{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"llanta-service\", \"resources\":{\"limits\":{\"memory\": \"$LLANTA_SERVICE_LIMIT\"},\"requests\":{\"memory\": \"$LLANTA_SERVICE_LIMIT\"}}}, {\"name\":\"llanta-job-netflow-beaconing\", \"resources\":{\"limits\":{\"memory\": \"$LLANTA_JOB_LIMIT\"},\"requests\":{\"memory\": \"$LLANTA_JOB_LIMIT\"}}}, {\"name\":\"llanta-job-time-series\", \"resources\":{\"limits\":{\"memory\": \"$LLANTA_JOB_LIMIT\"},\"requests\":{\"memory\": \"$LLANTA_JOB_LIMIT\"}}}, {\"name\":\"llanta-worker\", \"resources\":{\"limits\":{\"memory\": \"$LLANTA_WORKER_LIMIT\"},\"requests\":{\"memory\": \"$LLANTA_WORKER_LIMIT\"}}}]}}}}" && k delete pod llanta-detectors-0