NSX 4.1.x Transport Nodes disconnected after running replace_certs.py to replace expired certs
search cancel

NSX 4.1.x Transport Nodes disconnected after running replace_certs.py to replace expired certs

book

Article ID: 369349

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • After upgrading to 4.1.x, may see the self-signed certificates in NSX managers expired/expiring. This is related to a known issue: NSX alarms indicating certificates have expired or are expiring 
  • You use the python script in the above KB to replace the expired certs.
  • In some cases, after running the replace_certs.py script, the NSX Transport Nodes are disconnected from the NSX managers. 
  • ou may see similar entries in the following logs:

    /var/log/syslog*

    2024-05-22T08:10:52.101Z nsx-proxy[4388757]: NSX 4388757 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="4388787" level="WARNING"] Certificate validation: couldn't find SHA256 digest 'redacted' in local trust store
    2024-05-22T08:10:57.115Z nsx-proxy[4388757]: NSX 4388757 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="4388787" level="WARNING"] Certificate validation: couldn't find SHA256 digest 'redacted' in local trust store
    2024-05-22T08:11:02.135Z nsx-proxy[4388757]: NSX 4388757 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="4388787" level="WARNING"] Certificate validation: couldn't find SHA256 digest 'redacted' in local trust store
    2024-05-22T08:11:07.150Z nsx-proxy[4388757]: NSX 4388757 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="4388787" level="WARNING"] Certificate validation: couldn't find SHA256 digest 'redacted' in local trust store
    2024-05-22T08:11:12.166Z nsx-proxy[4388757]: NSX 4388757 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="4388787" level="WARNING"] Certificate validation: couldn't find SHA256 digest 'redacted' in local trust store
    2024-05-22T08:11:17.183Z nsx-proxy[4388757]: NSX 4388757 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="4388787" level="WARNING"] Certificate validation: couldn't find SHA256 digest 'redacted' in local trust store
    2024-05-22T08:11:27.517Z nsx-proxy[4395471]: NSX 4395471 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="4395504" level="INFO"] StreamConnection[9 Connected to ssl://NSX-Manager:1234 sid:9] Connected from ssl-tcp://NSX-TN:13663 to server with certificate with sha256 fingerprint 'redacted'
    
    
    2024-05-22T07:35:50.765Z NSX 3987044 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="reqId" subcomp="manager" username="admin"] Heartbeating for host host-uuid is down.
    2024-05-22T07:35:50.945Z NSX 3987044 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="reqId" subcomp="manager" username="admin"] Heartbeating for host host-uuid is down.
    2024-05-22T07:35:51.033Z NSX 3987044 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="reqId" subcomp="manager" username="admin"] Heartbeating for host host-uuid is down.
    2024-05-22T07:35:51.141Z NSX 3987044 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="reqId" subcomp="manager" username="admin"] Heartbeating for host host-uuid is down.
    2024-05-22T07:35:51.218Z NSX 3987044 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="reqId" subcomp="manager" username="admin"] Heartbeating for host host-uuid is down.
    2024-05-22T07:35:51.323Z NSX 3987044 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="reqId" subcomp="manager" username="admin"] Heartbeating for host host-uuid is down.
    2024-05-22T07:35:51.443Z NSX 3987044 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" reqId="reqId" subcomp="manager" username="admin"] Heartbeating for host host-uuid is down.
    
    
    2024-05-22T07:29:37.357Z nsx-proxy[2101556]: NSX 2101556 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2101593" level="INFO"] StreamSocket[4321 Open f:47 i:0 ? -> ssl://NSX-TN:1235] on_connect 336134278-certificate verify failed
    2024-05-22T07:29:37.357Z nsx-proxy[2101556]: NSX 2101556 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2101593" level="WARNING"] StreamConnection[4321 Connecting to ssl://NSX-TN:1235 sid:4321] Couldn't connect to 'ssl://NSX-TN:1235' (error: 336134278-certificate verify failed)
    2024-05-22T07:29:37.357Z nsx-proxy[2101556]: NSX 2101556 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-net" tid="2101593" level="WARNING"] StreamConnection[4321 Error to ssl://NSX-TN:1235 sid:-1] Error 336134278-certificate verify failed
    2024-05-22T07:29:37.357Z nsx-proxy[2101556]: NSX 2101556 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="2101593" level="WARNING"] RpcConnection[4321 Connecting to ssl://NSX-TN:1235 0] Couldn't connect to ssl://NSX-TN:1235 (error: 336134278-certificate verify failed)
    2024-05-22T07:29:37.357Z nsx-proxy[2101556]: NSX 2101556 - [nsx@6876 comp="nsx-esx" subcomp="nsx-proxy" s2comp="nsx-rpc" tid="2101593" level="WARNING"] RpcTransport[0] Unable to connect to ssl://NSX-TN:1235: 336134278-certificate verify failed


    nsx-api.log

    2024-05-22T08:00:35.619Z ERROR WrapperStartStopAppMain TrustStoreServiceImpl 4101771 SYSTEM [nsx@6876 comp="nsx-manager" errorCode="MP100" level="ERROR" subcomp="manager"] Failed to sync certificate between DB and disk for profile: profileName: APH-TN, serviceType: APH_TN, preProcessor: null, postProcessor: null, uniqueUse: false, clusterCertificate: false, requiresPrivateKey: true, nodeTypes: [global-manager, nsx-manager, nsx-shared], certificatePath: /etc/vmware/nsx-appl-proxy/appl-proxy-cert.pem, keyPath: /etc/vmware/nsx-appl-proxy/appl-proxy-privkey.pem

Environment

NSX 4.1.x

This can happen on both federated and non-federated environment 

Cause

This is a known issue when a customer upgrades to 4.1.x and performs replace-certificate of APH_TN.

Proton cannot update certificate because of missing permissions for user uproton.

-rw-r--r--  1 appl-proxy appl-proxy 1.7K Mar 11  2020 appl-proxy-cert.pem
-rw-r--r--  1 appl-proxy appl-proxy 1.7K Mar 11  2020 appl-proxy-privkey.pem
-rw-r--r--  1 appl-proxy appl-proxy  766 Mar 11  2020 openssl-appl-proxy.cnf
-rw-r--r--  1 appl-proxy appl-proxy   52 Mar 11  2020 appl-proxy-public-cfg.json
-rw-r--r--  1 appl-proxy appl-proxy   90 Mar 11  2020 appl-proxy-public-cfg.xml
-rw-r--r--  1 appl-proxy appl-proxy 2.2K Dec 15  2019 appl-proxy.xml

Resolution

This issue is resolved in VMware NSX 4.2.0

Workaround:

Use the version 1.1 or higher of the replace_certs.py script to prevent this from happening. 

  1. Go to the nsx-appl-proxy directory by running below command on a NSX manager:

    cd /etc/vmware/nsx-appl-proxy

  2. Run below command to remove tmp files. The ".*" after pem cleans up only tmp key files.

    rm appl-proxy-privkey.pem.*

  3. Run below commands to change permissions for appl-proxy related certs and keys. Post 4.1.0, the below files requires uproton permissions.

    chown uproton:appl-proxy appl-proxy-cert.pem
    chmod 660 appl-proxy-cert.pem

    chown uproton:appl-proxy appl-proxy-privkey.pem
    chmod 660 appl-proxy-privkey.pem

    chown uproton:appl-proxy appl-proxy-ar-cert.pem
    chmod 660 appl-proxy-ar-cert.pem

    chown uproton:appl-proxy appl-proxy-ar-privkey.pem
    chmod 660 appl-proxy-ar-privkey.pem

  4. Check the permissions for files under this folder. Run,

    ls -lart

    Example of how permissions should appear:

    total 40
    -rw-r--r-- 1 appl-proxy appl-proxy 3136 Jan 1 2000 appl-proxy.xml
    -rw-r--r-- 1 appl-proxy appl-proxy 90 Apr 3 00:34 appl-proxy-public-cfg.xml
    -rw-r--r-- 1 appl-proxy appl-proxy 52 Apr 3 00:34 appl-proxy-public-cfg.json
    -rw-r--r-- 1 appl-proxy appl-proxy 766 Apr 3 00:34 openssl-appl-proxy.cnf
    -rw-rw---- 1 uproton appl-proxy 1704 Apr 3 00:34 appl-proxy-privkey.pem
    -rw-rw---- 1 uproton appl-proxy 1639 Apr 3 00:34 appl-proxy-cert.pem
    -rw-rw---- 1 uproton appl-proxy 1704 Apr 3 00:34 appl-proxy-ar-privkey.pem
    -rw-rw---- 1 uproton appl-proxy 1639 Apr 3 00:34 appl-proxy-ar-cert.pem

  5. SSH into NSX Transport node and restart the nsx-proxy and nsx-opsagent services

    /etc/init.d/nsx-proxy restart
    /etc/init.d/nsx-opsagent restart

  6. If you still see the host disconnected, run the following:

    On one of the NSX manager:

    • get certificate api thumbprint

    On the hosts:

    • nsxcli -c sync-aph-certificates NSX-Manager-IP username admin thumbprint <thumbprint> password <password>
    • /etc/init.d/nsx-proxy restart