New data does not appear in NSX Intelligence after an upgrade, cluster restart, or certificate update.
search cancel

New data does not appear in NSX Intelligence after an upgrade, cluster restart, or certificate update.

book

Article ID: 319826

calendar_today

Updated On:

Products

VMware NSX VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Data ingestion failed for druid tables after druid pods restart. No new data comes into the corresponding druid tables after restart.

Symptoms:

  • User interactions and system functionality may be affected by various ingestion failures.
  • Failures in flow data ingestion can impact categories like visualization, recommendation, and network traffic analysis (NTA), potentially leaving users without access to new flow insights or limited to outdated information when running the recommendations.
  • Additionally, failures in ingesting context-related metadata hinder the visualization of user and process information.
  • Moreover, failures in configuration data ingestion create an inability to stream new configuration changes, such as groups or DFW policies, which affects the system's intelligence and the accuracy of analyses.

 

  1. To determine if this issue occurred:

    For NSX Version 4.1, get the pod restart order.
    To get the logs, SSH into the NSX Manager via the root user. On NSX manager, run the command:

    napp-k get pods

    You should find druid overlord pod with prefix ‘druid-overlord-’ and druid middle manager pods with prefix ‘druid-middle-manager-’ as in this example:


    napp-k get pods
    NAMESPACE NAME READY STATUS RESTARTS AGE
    nsxi-platform pod/druid-middle-manager-0 1/1 Running 0 25m
    nsxi-platform pod/druid-middle-manager-1 1/1 Running 0 27m
    nsxi-platform pod/druid-middle-manager-2 1/1 Running 0 29m
    nsxi-platform pod/druid-overlord-58d45f9b 1/1 Running 0 29m

    From the AGE column, it is evident that all druid pods (with prefix ‘druid-’) have been restarted recently and druid middle manager pods are initiated after the druid overlord pod.

    For NSX Version 4.2, confirm that no task is created after pod restart.
    From the druid overlord pod logs, when search for the corresponding indexing task keyword:

    index_kafka_correlated_flow_viz
    index_kafka_correlated_flow_rec
    index_kafka_correlated_flow
    index_kafka_pace2druid_policy_intent_config
    index_kafka_pace2druid_manager_realization_config
    index_kafka_context_user_metadata
    index_kafka_context_process_metadata
    index_kafka_context_event

    No new task is created for the topic and no new data is ingested for the topic.
  2. On NSX manager, run the command:

    napp-k get pods --selector='app.kubernetes.io/component=druid.overlord'
     
    Find a pod with prefix "druid-overlord-" smilar to the example:

    napp-k get pods --selector='app.kubernetes.io/component=druid.overlord'
    NAME READY STATUS RESTARTS AGE
    druid-overlord-7b6849f98b-n97xm 1/1 Running 1 11h

  3. Then run the following command to check the logs:

    napp-k logs <name of the druid overlord pod> | grep <indexing keyword above>

  4. In the logs, search the following string: “​​Inserting task [<indexing keyword above>”, no logs should be found.

Environment

VMware NSX-T Data Center
VMware NSX-T Data Center 4.x

Cause

When all druid pods restart (this can happen after the upgrade, cert changes etc…), some tasks previously running and not finished before the pod restart may enter erroneous states. If the new druid overlord pod becomes ready before druid middle manager pods, the tasks may be stuck at pending state and no new data is ingested.

Resolution

This is a known issue affecting VMware NSX. There is currently no resolution. This issue will be addressed in a future version of VMware NSX.

Workaround:

Restart the druid-overlord pod with the following command:

napp-k delete pod <name of the druid overlord pod>


Example:
napp-k delete pod druid-overlord-7b6849f98b-n97xm