The Policy Pace Agent, within the NSX Manager, is responsible for streaming NSX configuration state to the Security Services Platform (SSP). This process encompasses two phases: an initial full sync (bulk replication of all current config objects) followed by delta streaming (incremental change propagation).
NSX, SSP
Due to larger NSX configuration corpus or infra slowness, the full sync phase can take unusually high time ( more than 20 minutes). Because of this delayed full sync process, Corfu streaming session may timeout and raise a StreamingException. This exception is unrecoverable within the current session, and hence the Pace Agent transitions to an unhealthy state, aborts the in-progress full sync. It does get restarted after sometime, but fails with same reason and agent goes into loop of full syncs.
Prerequisites
Before applying any of the remediations below, identify the leader node for the Intelligence Agent service. All configuration API calls must invoke on the the leader node.
Identify the Intelligence Agent leader:
Run the below commad from NSX manager cli
su admin -c "get cluster status verbose" | grep 'INTELLIGENCE_AGENT_SERVICE'
Example output:
The UUID field ( b3870142-bc15-eb87-549c-XXXXXXXXXXXX ) is the node ID of the leader. Resolve this to an IP address via your cluster node mapping and use that IP for all subsequent API calls.
NSX 4.2.3 only (NSX 4.2.4+ and 9.x have advanced streaming enabled by default; this step is a no-op on those versions.)
This setting is persistent and does not need to be reverted.
NSX 4.2.3+, 9.0.2, 9.1+
(For earlier versions, see 2b below)
By default, the Kafka producer in the Pace Agent communication service is configured with acks=all, requiring acknowledgement from all in-sync replicas (ISR) before a produce request is considered committed. Under high-throughput replication, this introduces per-message round-trip latency that accumulates over the duration of a full sync and contributes to Corfu session timeout.
Setting relax.kafka.producer.acks=true downgrades the producer acknowledgement to acks=1 (leader-only ack), substantially reducing per-message write latency and overall sync duration.
Warning: This is a temporary mitigation. Relaxed producer acks reduces durability guarantees. If the Kafka leader fails mid-sync, acknowledged messages may be lost, requiring a re-sync. Revert this setting immediately once the Pace Agent returns to healthy state and config sync is confirmed complete.
NSX 4.2.0, 4.2.1, 4.2.2, 9.0.0, 9.0.1
For versions that do not support the producer acks config via the system-config API, equivalent durability relaxation can be achieved by reducing min.insync.replicas on the relevant Kafka topics.
Please contact Broadcom Technical Support to assist you in safely executing this mitigation step.
Note : 2b has kafka CLI command which should be executed under supervision of support team.
After applying the applicable mitigations, monitor Pace Agent health and config sync progress in the SSP UI.
Note: Readiness can take 30 minutes to 2 hours to turn Ready, depending on NSX configuration size, cluster load etc.