Local manager shows sync status as "disconnected" on the Global Manager
search cancel

Local manager shows sync status as "disconnected" on the Global Manager

book

Article ID: 403569

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • On the Global Manager, the Local Manager sync status shows as "Disconnected"

.

  • Clicking on the refresh button beside "Disconnected" does nothing and is of no help.
  • "/api/v1/messaging/cluster-connection/status": {
     {
            "address": "ssl://##.##.##.##:1236",
            "conn_status": "Disconnected",
            "node_id": "9d####d3-####-433f-####-0####0####3c",
            "node_type": "APPLIANCE_PROXY_HUB"
          },
    {
            "address": "ssl://##.##.##.##:1236",
            "conn_status": "Disconnected",
            "node_id": "dc####0c-####-4e83-####-3####4####15",
            "node_type": "APPLIANCE_PROXY_HUB"
          },
    {
            "address": "ssl://##.##.##.##:1236",
            "conn_status": "Disconnected",
            "node_id": "b4####56-####-472a-####-f####1####a3",
            "node_type": "APPLIANCE_PROXY_HUB"
          },
  • Connections initiated by the local manager to the global manager are not successful because of certificate validation errors

In /var/log/vmware/appl-proxy-rpc.log we see the following snippets:

YYYY-MM-DDTHH:MM:SS.MSZ #####n1.corp.####.org NSX 2469837 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="2469841" level="WARNING"] StreamConnection[5513751 Connecting to ssl://10.##.##.22:1236 sid:5513751] Couldn't connect to 'ssl://10.##.##.22:1236' (error: 335544539-short read)

YYYY-MM-DDTHH:MM:SS.MSZ ######n1.corp.####.org NSX 2469837 - [nsx@6876 comp="nsx-manager" subcomp="appl-proxy" s2comp="nsx-net" tid="2469841" level="INFO"] StreamSocket[5513749 Open f:46 i:-237289773 ? -> ssl://10.##.##.23:1236] on_connect 335544539-short read

 

 

 

Environment

VMware NSX

Cause

The issue occurs when certificates are imported with extra characters between the end of one certificate and the beginning of the next certificate. This typically results in improperly formatted certificate chains, which leads to parsing errors. 

BEGIN CERTIFICATE-----\n<redacted>-----END CERTIFICATE-----\n"#012 }#012}#012conn_cfg {#012 uuid {#012 left: ######17936196579#012 right: #######203377324339#012 }#012 node_type: APPLIANCE_PROXY_HUB#012 address {#012 addr {#012 ip_addresses {#012 ipv4: #####8705#012 }#012 }#012 port: 1236#012 }#012 certificate {#012 certificate: "-----BEGIN CERTIFICATE-----

This formatting issue prevents proper recognition and validation of certificates by the system, causing connection failures and errors.

Resolution

The removal of such extra characters present in the imported certificate is fixed in NSX 4.2.0

Workaround:

1. Please run below commands on any of the Local Manager nodes. This will generate a certificate and private key for  replacing APH-AR certificates

  openssl req -new -newkey rsa:2048 -days 3650 -nodes -x509 -keyout /tmp/test-key1.pem -out /tmp/test-cert1.pem -config /etc/vmware/nsx-appl-proxy/openssl-appl-proxy.cnf


2. Now log in to the manager UI. Goto System > Certificates > Import > Certificate

    a. Name the certificate
    b. Disable Service Certificate toggle
    c. Copy test-cert1.pem to "Certificate Contents"
    d. Copy test-key1.pem to "Private key"
    e. Click on save

3. Obtain the certificate ID for the newly imported certificate from UI. The ID field should have the UUID.

4. Run the below API with the certificate ID and any one of the LM nodes.

  POST https://<nsx-mgr>/api/v1/trust-management/certificates/<cert-id>?action=apply_certificate&service_type=APH&node_id=<node-id>

5. Post replacement, validate on UI. Ensure that the "Where Used" field has "1" under the newly imported certificate.

6. Repeat the steps from 1-5 for the remaining 2 LM nodes.

7. Now, re-onboard the site using the below API,

    POST https://<Active GM node IP>/api/v1/sites?action=onboard_site

    {
         “address”: “<LM node IP>”,
         “username”: “admin”,
         “password”: “<password>”,
         “thumbprint”: “<LM node thumbprint>”,
         “site_name”: “<site name>”
    }

8. Ensure that the LM Sync status is now "Connected."