Troubleshooting NSX Manager Onboarding and Site Connectivity Issues in SSP
search cancel

Troubleshooting NSX Manager Onboarding and Site Connectivity Issues in SSP

book

Article ID: 414826

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

When onboarding NSX Manager into the Security Services Platform (SSP), you may see site readiness problems (Readiness: Not Ready), connectivity issues (Inventory Sync: Down / Unknown), or a top-of-screen banner “No valid license available.”

These symptoms are usually caused by one or more of the following categories:

(1) Certificate / trust mismatches (expired or unsynced certs).

(2) Network/firewall blocking of messaging ports.

(3) Stale NSX platform registry or kubeconfig entries (leftover from undeploys).

(4) Agent concurrency or sync-thread issues inside NSX or

(5) Inability to reach external helm/repos during feature activation.

The most effective troubleshooting approach is to identify the symptom, verify certs & ports, check site/service pod health, inspect relevant NSX logs and then apply the targeted resolution steps below.

Environment

vDefend SSP 5.0, vDefend SSP 5.1

Resolution

Sl.No. Symptom (as seen in SSP UI / logs) Likely Root Cause / Description Detailed verification steps and Resolution Reference KB
1.

SSP UI -> System -> NSX Manager tab shows

Readiness: Not Ready

Inventory Sync: Down

Infrastructure Sync: Up

From NSX Manager CLI: nc -vz <ssp-messaging-url> 9092

Check site-service logs for

k get pods | grep site-service

k logs <site-service-pod> -n nsxi-platform

Error: x509 or certificate signed by unknown authority.

Messaging or certificate issue between NSX and SSP. Often port 9092 is blocked or certs expired. KB 407753 – Inventory sync down / port or cert issue.
2.

When trying to onboard NSX Manager to the SSP, Site Onboarding Fails with an error (“failed to connect to site – x509: certificate signed by unknown authority”)

SSP-Installer CLI:

k get pods | grep site-service

k logs <site-service-pod> -n nsxi-platform shows certificate expiry error.
Expired or untrusted certificate between NSX and SSP. KB 405890 – Certificate expired during site onboarding.
3.

On the SSP UI - Banner shows: “No valid license available.”

System - NSX Manager tab shows Infra/Inventory Sync = Unknown

NSX Manager CLI - /var/log/proton/nsxapi.log shows

Error while validating kubeconf or timeouts.

SSP unable to validate license due to stale platform registry or leftover NAPP kubeconfig entries after undeploy. KB 413368 – Stale registry or kubeconfig blocking license validation.
4. “No Valid License Error” after NSX onboarding attempt

Check NSX UI System > Fabric > Hosts and ensure cluster state is healthy;

Review NSX Manager CLI - /var/log/proton/nsxapi.log for NullPointerException.
TN / Platform certificates not synced; license validation failed. KB 396403 – TN certificate sync failure causes license error.
5.

Feature activation fails and SSP UI shows the banner (“No Valid License Available”) with Infra Sync showing DOWN.

/var/log/proton/nsxapi.log shows

ConcurrentUpdateException or Caught exception in ScheduledExecutorService.

Common-Agent threading issue leaving stale sync threads;

Infrastructure Sync shows DOWN.

KB 390413 – Common-Agent concurrency issue.
6. Feature activation fails – Helm add repo error

SSP controller logs show Helm add repo operation failed or server misbehaving.
Network or DNS cannot reach helm repository projects.registry.vmware.com. KB 393412 – Helm repo or internet access failure.
7.

SSP UI -> System -> NSX Manager tab shows

Readiness: Not Ready

Infra/Inventory Sync: Unknown


k describe site -n nsxi-platform >>>
Output shows:

Reason: RequiredForInterop, Status: False.

k get sites >>> Output shows:
SiteConditionConfiguredPlatformDeploymentConfig=False
NSX–SSP version interop mismatch; platform deployment config not applied. KB 413561 – Platform deployment config interop mismatch.

8.

SSP UI -> System -> NSX Manager tab shows

Readiness: Not Ready

Infra Sync: Down

kubectl describe site -n nsxi-platform >>> Output shows:


COMMON_FULLSYNC not started;

/var/log/proton/nsxapi.log on the NSX Manager shows
NoSuchFileException: .commonagent_keystore.

Missing or corrupted NSX Common-Agent keystore after Proton restart or NSX restore. Certificates not synced from trust-manager DB to disk. KB 409676 – CommonAgent keystore missing post Proton restart.
9.

Proton service repeatedly restarting on NSX after onboarding SSP

Check /var/log/proton/proton-tomcat-wrapper.log on the NSX Manager for “The JVM has run out of memory”;

use corfu_tool_runner.py -n nsx -t DirectoryUser -o showTable to count users.

Large number of LDAP directory users (>100K) being streamed to SSP during onboarding causes Proton memory exhaustion (OOM). KB 403824 – Proton crash due to large LDAP sync.
10.

Offboarding not possible due to active SSP reference / re-onboarding fails
(Error: “This site is already registered to an SSP or NAPP instance”)

Onboarding fails with duplicate registration error; NSX logs show existing site reference.

NSX still retains stale SSP binding entries after force deletion of SSP. KB 382295 – Force cleanup using site-offboarding-cleanup-nsx-ssp5.0.sh.
11.

SSP UI -> System -> NSX Manager tab shows

Readiness: Not Ready

Infra Sync: Down

kubectl describe site shows

PACE_UFOSTORE_SUBSCRIBE not started;

Verify port connectivity using:

nc -vz <ssp-ingress> 443 and

nc -vz <ssp-messaging> 9092

Network or infrastructure connectivity issue between NSX and SSP causing PACE agent startup failure. KB 402380 – Connectivity issue preventing PACE agent sync.
12.

SSP UI -> System -> NSX Manager tab shows

Infra Sync: Down and

Flows are not visible in Intelligence UI

/var/log/proton/nsxapi.log shows

produceCertMsgs and NullPointerException;

nsxcli -c get intelligence flow stats ack shows no acknowledgements.

NSX Common-Agent encounters NullPointerException while building certificate messages; TNs or flow clients have invalid/missing certificates. KB 401182 – CommonAgent certificate handling error.
13.

SSP UI -> System -> NSX Manager tab shows

Infra Sync: Down

Site remains Not Ready post NSX backup and restore

/var/log/proton/nsxapi.log shows

NoSuchFileException: .commonagent_keystore;

Known issue in early NSX 4.2.x builds where Common-Agent keystore not rebuilt from DB after restore. KB 390751 – NSX–SSP connection down post restore (cert sync issue).
14.

SSP UI -> System -> NSX Manager tab shows

Infra Sync: Down

/var/log/proton/proton_restart.log

APPLICATION IS GOING RESTART 

(GMLE leadership safety violation handler triggered for groupType: mp)

/var/log/proton/nsxapi.log

Lease loss acknowledgement has not been received in 

WaitingForLeadershipLostAckState for lease id xxxxx-xxxx-xxxx-xxxx-xxxxxxx service

COMMON_AGENT_SERVICE on member xxxxxx-xxxxxx-xxxxxx-yyyyy-zzzzzz of group aaaa-bbbbbb-cccccc-dddd.

Common Agent was not stopping existing threads on a restart. Also, Common Agent failed to ack lease-loss within GMLE timeout period.  KB 414655 - Infra Sync Down due to GMLE leadership safety-violation.

Additional Information

Post off-boarding the NSX Manager there is a known issue while re-onboarding it back.

Please refer to Modifications to private IP ranges are retained even after the NSX Manager has been off-boarded for more information.