New flows not showing up in Security Explorer/Visibility & Planning
search cancel

New flows not showing up in Security Explorer/Visibility & Planning

book

Article ID: 413222

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

On setups with resource constraints or high latency, there can be additional delay when Latestflow pod processes flows and writes to Kafka.

This causes too many messages to queue in the Latestflow pod and the pod will then go OOM. 

Symptom: 

You may experience new flows not showing up on the Security Explorer/Visibility & Planning UI canvas in the SSP UI if this issue is occurring.

Environment

SSP Version >= 5.0

Cause

Some setups we've observed in testing have higher than expected latency when producing messages to Kafka. This can be due to network slowness, resource contention, or other factors.

For example, we can use the kafka-producer-perf-test.sh tool present in the cluster-api pod to benchmark the performance of kafka producers:

To run this test:

  1. SSH into SSPI via root or sysadmin as per 5.0 / 5.1 versions
  2. Exec into the cluster-api pod
    1. k -n nsxi-platform get pods | grep cluster-api
    2. k -n nsxi-platform exec -it <name from previous command> -c cluster-api -- bash
  3. Run the command:
/opt/kafka/bin/kafka-producer-perf-test.sh --topic correlated_flow_viz --num-records 1000 --record-size 1024 --throughput -1 --producer.config /root/adminclient.props
  • Healthy Setup:

    • 1000 records sent

    • 1428.6 records/sec (1.40 MB/sec)

    • 168.49 ms avg latency, 612.00 ms max latency 

  • Slow Setup:

    • 1000 records sent

    • 691.1 records/sec (0.67 MB/sec)

    • 317.61 ms avg latency, 1221.00 ms max latency

You can see that the throughput is less than half that of the healthy setup.

 

When producer slowness occurs, messages can backup in the Latestflow pod causing OOM. If we look for Latestflow pods with:

  1. SSH into SSPI via root or sysadmin as per 5.0 / 5.1 versions
  2. Exec into the cluster-api pod
k -n nsxi-platform get pods | grep latestflow

We will see that the pods have one or more restarts listed. Investigating the pod events or pod logs will lead to us finding some memory related error message.

Output similar the following can be found when describing the pod:

Labels
alertname = PodOOMKilled
container = latestflow
namespace = nsxi-platform
pod = latestflow-758bc5dfd5-6vkgx
reason = OOMKilled
severity = critical
uid = 5cc1ecd3-4ef5-4e78-80dd-9d9cc7fdcb9d
Annotations
description = Pod nsxi-platform/latestflow-758bc5dfd5-6vkgx container latestflow was terminated due to out-of-memory.
summary = Pod nsxi-platform/latestflow-758bc5dfd5-6vkgx was OOMKilled

Resolution

Please Contact Broadcom support for further assistance.