Streaming NTA detectors need increased resources to provide detection at scale
search cancel

Streaming NTA detectors need increased resources to provide detection at scale

book

Article ID: 388023

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

In environments with a high number of workloads and high flow rate, streaming NTA detectors need higher resources to effectively perform detection. 

The health status of the NSX Intelligence feature is reported as Down or degraded in the UI.
The DOWN/Degraded status may be caused by the llanta-detectors-0  pod running out of memory.
By Running the below commands on am SSPI while you have an ssh session as root you will see the following output snippets:
The llanta-detectors-0  pod is reported in the CrashLoopBackOff status:

k get pods -A | grep -v "Run\|Com"
NAME READY STATUS RESTARTS AGE
llanta-detectors-0 3/4 CrashLoopBackOff 16 (3m19s ago) 28h

k describe pod llanta-detectors-0
...
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled <<<<<<<<<<<<================

Environment

SSP 5.0, SSP 5.1

Cause

These containers inside of pod process data and maintain an in-memory state and in some case the memory usage can grow to exceed the limits set for the container. The followings are the main contributing factors but it is not only limited to these.

In environments where:

  1. SSP is monitoring a high number of workloads. (observed at ~36.000 workloads)
  2. The flow rate is high. (observed at ~ 1M flows/5 minutes)
  3. All the NTA streaming detectors are enabled — these are:
    a. Destination IP Profiler
    b. Domain Generation Algorithm
    c. DNS Tunneling
    d. Netflow Beaconing
    e. Port Profiler
    f. Server Port Profiler
    g. Unusual Network Traffic Pattern

An Alarm might be raised as follows:

  • Feature: Security Services Platform
  • Event Type: Service Down
  • Entity: llanta-detectors
  • Description: Platform Service llanta-detectors is degraded.

Resolution

The issue is due to the high number of flows processed by the NTA streaming component, in conjunction with a high number of workloads generating the flows. It is only observed on scaled-out environments: typically only with 10 Worker Nodes.

The workaround for this issue is to increase the memory allocated to the llanta-detectors-0 pod for the couple of containers by running the following command from the SSPi instance:

The above process involved to increase the memory requires some internal config changes and please contact Broadcom Support for further assistance.