Alarm For Transport Node Certificate Has Expired - Host/Edges MPA Disconnected
search cancel

Alarm For Transport Node Certificate Has Expired - Host/Edges MPA Disconnected

book

Article ID: 345825

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Title: Alarm for transport_node_certificate_expired
Event ID: transport_node_certificate_expired
Alarm Description

  • Purpose: Notify User that Transport Node Certificate has expired.

  • Impact: Transport Nodes, Hosts and Edges, can disconnect from the Managers and cannot be connected back, MPA Disconnected state.

  • Cause: Transport Node Certificate has expired.

Warning: This alarm must be addressed as soon as possible. Once the TN certificate expires, there is a grace period of 24 hours after which all impacted Edges and Hosts will be disconnected from NSX.

Environment

VMware NSX 4.1.x, 4.2.x

Cause

  • Versions NSX 4.1.x and 4.2.0, Edge and Host Transport Nodes were instantiated using a certificate with validity period of 825 days.
  • Transport Nodes in NSX-T 3.x and NSX 4.2.1 (and later) are created with a certificate that is valid for 10 years.
  • Any Edge/Host that may have been deployed on these versions or any Hosts prepared or re-prepared on these versions will have this shorter validity period certificate.
  • The Transport Node certificate by design is not replaced on upgrade.
  • This issue can be seen on both impacted and fixed versions since it is dependent on the release the Transport Node was installed on.

Resolution

Manual intervention is required for any node originally deployed on 4.1.x/4.2.0.

For NSX versions from 4.1.0 through to 4.2.0 inclusive:

  • Check the connection status of the Transport Node on the NSX UI, System -> Fabric -> Hosts/Nodes

Note:

  • A Host or Edge may still show as Connected and Success even though the Certificate is expired.
  • Validate the expiry date of the certificate as root on the Host or Edge, using the command:
    openssl x509 -enddate -noout -in /etc/vmware/nsx/host-cert.pem.

Transport Node has an expired or expiring certificate but is still connected to NSX:

Transport Node certificate has expired and TN is in a disconnected state in NSX:

    1. SSH to the Transport Node as root user.
    2. Empty Transport Node certificate and private key:

      cat /dev/null > /etc/vmware/nsx/host-cert.pem
      cat /dev/null > /etc/vmware/nsx/host-privkey.pem
    3. Generate a new self-signed TN certificate and key:

      For NSX 4.1.2.5 and higher, restarting the nsx-proxy service creates the new cert-key pair:(Move to Step 4 post this):
      /etc/init.d/nsx-proxy restart

      For NSX 4.1.x versions prior to 4.1.2.5:

      1. Create a temporary OpenSSL config file from the existing OpenSSL config:
        • cat /etc/vmware/nsx/openssl-proxy.cnf > /tmp/tmp-openssl-proxy.cnf 
      2. UUID is extracted and added to the temporary OpenSSL config.
        • echo "UID = $(grep -o '<uuid>[^<]*' /etc/vmware/nsx/host-cfg.xml | sed 's/<uuid>//')" >> /tmp/tmp-openssl-proxy.cnf
      3. Add extension in the temporary OpenSSL config.
        • echo -e "[ req_ext ]\nbasicConstraints     = CA:FALSE\nextendedKeyUsage     = clientAuth\nsubjectKeyIdentifier = hash\nauthorityKeyIdentifier = keyid,issuer" >> /tmp/tmp-openssl-proxy.cnf
      4. Replace the certificate, where below -days parameter specifies 3650 days (10 years) validity period.
        • openssl req -new -newkey rsa:2048 -days 3650 -nodes -x509 -keyout /etc/vmware/nsx/host-privkey.pem -out /etc/vmware/nsx/host-cert.pem -config /tmp/tmp-openssl-proxy.cnf -extensions req_ext
    4. Identify NSX Manager thumbprint, SSH as admin user to NSX Manager:

      get certificate api thumbprint

    5. To push the new cert-key pair to the Manager, from root user on the Host or Edge run (Any NSX Manager name or IP can be used)
      • For the Edges (the manager ip that should be used will be the manager that is disconnected in the output of get controllers)
        • su admin -c push host-certificate <Manager hostname-or-IP> username admin thumbprint <thumbprint from step 4>
        • su admin -c sync-aph-certificates <Manager hostname-or-IP> username admin thumbprint <thumbprint from step 4>
      • For the Hosts (the manager ip that should be used will be the manager that is disconnected in the output of get controllers)
        • nsxcli -c push host-certificate <Manager hostname-or-IP> username admin thumbprint <thumbprint from step 4>
        • nsxcli -c sync-aph-certificates <Manager hostname-or-IP> username admin thumbprint <thumbprint from step 4>
    6. In the NSX UI, navigate back to the host under System-->Fabric-->Hosts and resolve the alarm by selecting the host disconnected link.
    7. If the Transport Node status still reflects an error, it is sometimes necessary to restart the nsx-proxy and nsx-opsagent on the Transport Node to restore this connection.
      • Edge:
        • /etc/init.d/nsx-proxy restart
        • /etc/init.d/nsx-opsagent-appliance restart
      •  Host:
        • /etc/init.d/nsx-proxy restart
        • /etc/init.d/nsx-opsagent restart

 

Additional Information