In environments with a high number of workloads and high flow rate, streaming NTA detectors need higher resources to effectively perform detection.
The health status of the NSX Intelligence feature is reported as Down or degraded in the UI.
The DOWN/Degraded status may be caused by the llanta-detectors-0 pod running out of memory.
By Running the below commands on am SSPI while you have an ssh session as root you will see the following output snippets:
The llanta-detectors-0 pod is reported in the CrashLoopBackOff status:
k get pods -A | grep -v "Run\|Com"
NAME READY STATUS RESTARTS AGE
llanta-detectors-0 3/4 CrashLoopBackOff 16 (3m19s ago) 28h
k describe pod llanta-detectors-0
...
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled <<<<<<<<<<<<================
SSP 5.0, SSP 5.1
These containers inside of pod process data and maintain an in-memory state and in some case the memory usage can grow to exceed the limits set for the container. The followings are the main contributing factors but it is not only limited to these.
In environments where:
An Alarm might be raised as follows:
The issue is due to the high number of flows processed by the NTA streaming component, in conjunction with a high number of workloads generating the flows. It is only observed on scaled-out environments: typically only with 10 Worker Nodes.
The workaround for this issue is to increase the memory allocated to the llanta-detectors-0 pod for the couple of containers by running the following command from the SSPi instance:
The above process involved to increase the memory requires some internal config changes and please contact Broadcom Support for further assistance.