NSX configuration changes are not syncing to NSX Intelligence

Products

VMware NSX

Issue/Introduction

Config changes (Creation/Updates/Deletion of Groups/VMs, inventory objects and other config objects) on NSX Manager are not reflecting on Intelligence UI Discover & Take Action tab.
The NOW view on Discover & Take Action does not reflect the recent config updates as well as flows for those config will be missing. This can be verified by checking the lag in nsx2pace-config kafka topic using below steps:
- Get the cluster api pod name
  - kubectl get pods -n nsxi-platform | grep cluster-api
    output : cluster-api-xxxx-xxxx
- Exec into cluster-api pod
  - kubectl exec -it cluster-api-xxxx-xxxx -n nsxi-platform -c cluster-api /bin/bash
- Go to the /opt/kafka/bin directory
  - cd /opt/kafka/bin
- Execute the below command to see the LAG in nsx2pace-config kafka topic
  - ./kafka-consumer-groups.sh --bootstrap-server kafka:9092 --group intelligence-nsx-config-update --describe --command-config /root/adminclient.props
  - if the LAG is more than 3000, then it could be a symptom for this issue.

Environment

VMware NSX

Cause

NSX Intelligence is configured to process configuration updates at a predetermined rate and at a stipulated time interval. Due to high churn of configuration data (continuous updates/deletes) or Groups with large number of members (e.g. ~5000 to ~8000 members), or Groups with broader tagging criteria or multiple nested groups, Intelligence is unable to process new updates.

Resolution

The batch size at which the updates are being processed should be reduced and the time per batch needs be increased to cater to high churn using below steps:

Decrease the maxPollRecords (default is 300) property and increase the maxPollIntervalMs (default is 5 mins) on Intelligence

Edit the max.polls.records property in nsx-config ConfigMap.
kubectl edit configmap nsx-config -n nsxi-platform
Edit the the maxPollRecords property
kafka:
configUpdateConsumer:
maxPollRecords: 300 <---------- old value
maxPollIntervalMs: 300000 <----- old value
set maxPollRecords to 100 (suggested value) and set maxPollIntervalMs to 600000 ~ 10 mins(suggested value).

This would look like:
kafka:
configUpdateConsumer:
maxPollRecords: 100 <---------- new value
maxPollIntervalMs: 600000 <----- new value
Save file with esc and :wq
Restart the nsx-config pod
get nsx-config pod name
kubectl get pods -n nsxi-platform | grep nsx-config
output : nsx-config-XXXXX
delete the nsx-config pod to restart

Note: use pod name from above command.
kubectl delete pod nsx-config-XXXXX -n nsxi-platform

Workaround:

An immediate workaround for this issue can be to restart the nsx-config pod, as it would refresh the memory and cache of nsx-config pod and it would be able to process the next set of messages polled in stipulated time interval i.e. max.poll.interval.ms (default is 5 mins). However, this workaround is temporary and does not guarantee to resolve this issue. Steps are below:

# get nsx-config pod name

kubectl get pods -n nsxi-platform | grep nsx-config

output : nsx-config-XXXXX

# delete the nsx-config pod to restart

Note: Use pod name from above command.

kubectl delete pod nsx-config-XXXXX -n nsxi-platform