Problem:
After refreshing the Platform CA certificate and Ingress certificate, a backup of the SSP was taken.
Before removing the SSP, the Platform CA was refreshed again, generating a new internal certificate chain.
The SSP was then force-deleted without NSX cleanup and redeployed as a new instance.
When attempting to restore the backup (taken prior to the second CA refresh), the restore process failed during the NSX reconnect phase.
When you try to manually reconnect NSX Manager, the Infrastructure Sync status remained DOWN, and the following error was observed in the site status:
Symptom: Infrastructure sync is DOWN on SSP UI.
From SSP-I CLI, run:
k -n nsxi-platform get sites -oyaml
And look for the error:
COMMON_FULLSYNC failed due to: java.lang.Exception: produceCertMsgs
Security Services Platform(SSP) 5.0 and 5.1 with
Onboarded NSX Manager versions 4.2.0, 4.2.1, 4.2.2, 4.2.3, 9.0
The Common Agent status API reports that Full Sync has failed, even though the synchronization actually completed successfully.
No functional impact observed in data exchange, but the status reporting remains incorrect.
Stale background threads remained active after certificate refresh operations.
These stale threads continued to report outdated Common Agent status information to the status API, leading to an incorrect “COMMON_FULLSYNC failed due to: java.lang.Exception: produceCertMsgs” message even though the full synchronization was completed successfully.
Since this issue occurred after a Platform CA certificate and Ingress certificate refresh, the Kafka server and client certificates must be verified before proceeding with the workaround.
Run the following command on the SSP-I CLI to retrieve the messaging configuration:
formFactor: Advanced
helmRepo: oci://<repo-path>
ingressFQDN: <ingress-fqdn.example.com>
messagingFQDN: <messaging-fqdn.example.com>
messaging-fqdn.example.com) is reachable from the NSX manager.On the NSX Manager CLI, you may try executing
nc -vz <messaging-fqdn.example.com> 9092
The expected output
Connection to <messaging-fqdn.example.com> (<resolved-ip-address>) 9092 port [tcp/*] succeeded!
Run the command below on the NSX Manager:
openssl s_client -showcerts -connect <messaging-fqdn.example.com>:9092 < /dev/null 2>/dev/null | openssl x509 -fingerprint -sha256 -noout
Example (masked) output:
SHA256 Fingerprint=XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX
List the certificates in the client truststore and verify that the Kafka certificate fingerprint matches the one from Step 2:
keytool -list -keystore /home/secureall/secureall/.store/.client_truststore -storepass $(cat /config/http/.http_cert_pw)
Example (masked):
On NSX Manager:
napp-common-agent and napp-pace-agent.
On SSP:
napp-common-agent and napp-pace-agent in the format:NSX_UA_KAFKA_CLIENT_<UUID_FROM_NSX_MANAGER>
Once Kafka certificates are confirmed to match, proceed with the workaround below.
Identify the NSX Manager Leader for Common Agent Service:
Get the Manager IP using the UUID from above:
SSH into the Identified NSX Manager Node:
Restart the NSX Manager Service:
This issue typically occurs after certificate replacement (Platform CA or Ingress cert) when stale threads persist in the agent service.
Restarting the NSX Manager on the leader node for Common Agent Service refreshes the internal thread state and corrects reporting.
Ensure Kafka certificate integrity before restarting to avoid messaging handshake errors.