After upgrading from SSP 5.0/5.1 to 5.1.1 NSX status shows Not Ready
search cancel

After upgrading from SSP 5.0/5.1 to 5.1.1 NSX status shows Not Ready

book

Article ID: 432675

calendar_today

Updated On:

Products

VMware vDefend Firewall

Issue/Introduction

After upgrade of SSP 5.0/5.1 (with NSX onboarded) to SSP 5.1.1, the infrastructure sync shows DOWN on the SSP UI.

After logging into SSPI installer, describe the site. You can see "COMMON_FULLSYNC failed due to: java.lang.Exception: produceCertMsgs"

k describe site -n nsxi-platform

Status:
  Conditions:
    Last Transition Time: 2026-02-18T01:36:31Z
    
    Reason: NsxConfigTOIUpdated
    Status: True
    Type: NsxStreamingReady
    Last Transition Time: 2026-02-26T19:40:55Z
    Message: COMMON_FULLSYNC failed due to: java.lang.Exception: produceCertMsgs
    Reason: FullSyncNotReady
    Status: False
    Type: CommonAgentReady
    Last Transition Time: 2026-02-17T18:21:06Z

Environment

SSP 5.1.1

NSX version where this is known issue: 4.2.0, 4.2.1.1, 4.2.1.2, 4.2.3, 9.0.0

Cause

When the Service Common Agent stops, it fails to terminate all related threads. These stale threads remain active and incorrectly report the Full Sync status as DOWN, despite the fact that synchronization is successfully occurring from NSX to SSP.

NSX manager Logs /varlog/proton/nsxapi.log shows

message: "COMMON_FULLSYNC failed due to: java.lang.Exception: produceCertMsgs"
  action_name: "COMMON_FULLSYNC"
2026-03-02T16:22:21.019Z INFO ForkJoinPool.commonPool-worker-7 StatusTrackingServiceImpl 1769139 INTELLIGENCE [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] actionStateMsg is: action_name: "COMMON_FULLSYNC"
message: "COMMON_FULLSYNC failed due to: java.lang.Exception: produceCertMsgs"

Resolution

To workaround this issue, restart proton in a rolling fashion. This should not cause functional impact as you restart one proton at a time.

Find the leader for common agent. You restart it LAST.

Restart the other TWO NSX Managers one by one FIRST.

Step 1. Command to identify the NSX Manager node that is the leader for COMMON_AGENT_SERVICE:

su admin -c "get cluster status verbose" | grep COMMON_AGENT_SERVICE

Step 2. Get the IP address of the manager using the UUID from the above output:
 
su admin -c "get cluster status" | grep <uuid-of-manager-from-step-1>
 
Step 3. SSH into non common agent leader NSX nodes one by one
 
ssh root@<nsx-ip>

Step 4. Restart proton

systemctl restart proton

Step 5. Check the status

systemctl status proton

After ensuring other two NSX Managers are UP, Restart the leader node identified in Step 1 at last.

Step 6. Check the site status on SSPI. It should show Ready.

k describe site -n nsxi-platform

Step 7. Check NSX status on SSP UI. It should show Ready and Infrastructure sync UP.

Additional Information

This issue is fixed in NSX 4.2.3.2, 9.0.1