Using Certificate Analyzer, Results and Recovery (CARR) Script to fix certificate related issues in NSX
search cancel

Using Certificate Analyzer, Results and Recovery (CARR) Script to fix certificate related issues in NSX

book

Article ID: 369034

calendar_today

Updated On: 05-01-2025

Products

VMware NSX

Issue/Introduction

This script is intended to be used to resolve certificate management issues on NSX 3.2.x and 4.x. It performs integrity checks and recovery operations for NSX self-signed certificates, and can replace certificates that have expired or will be expiring soon.

Environment

VMware NSX 4.x
VMware NSX-T Data Center 3.2.x

Resolution

The script will make an assessment of certificate remediation needed, present the proposed changes and ask for approval to proceed.


Client Requirements:

  • For NSX 4.1.x and 4.2.x, the script can be run directly on a Local or Global NSX Manager from /root directory 
  • For NSX 3.2.x, 4.0.x an external client machine is required meeting the following requirements

    Python version requirements

    • 3.13+ requires the client machine to have internet connectivity and cannot be run offline
    • 3.8 to 3.12 can run with or without client internet connectivity

    OS : MAC and Linux

    Architecture -  (if internet connection is there then there is no restriction, dependencies are downloaded)

    • MAC : : "macosx_10_9_x86_64" "macosx_11_0_arm64"
    • Linux : : "musllinux_1_1_x86_64" "musllinux_1_1_aarch64" "manylinux_2_17_x86_64" "manylinux_2_17_s390x" "manylinux_2_17_aarch64" 

  • In all cases, the script requires the following ports to be open between the client machine and the 3 NSX Managers
    • ssh port 22
    • https port 443
    • corfu port 9000
      Note: if running on the NSX Manager directly, ports 443 and 9000 will already be open between the 3 Managers.

  • ssh access via admin and root users must be enabled on all NSX Managers, if needed see Enable ssh root access for NSX appliances. In Federation environments this requirement applies all LMs and GMs.

Execution Notes:

  • This script can be ran on any node and it will reach out to the respective NSX nodes in the correct order.
  • Ensure that you have a recent, valid backup of your NSX managers. Ensure that you know the passphrase for your backups.
  • On NSX 4.1.x and 4.2.x, the script should be run from the /root directory, it will not work from the /tmp directory
  • admin and root passwords of NSX Manager are required as inputs.
  • The script can be run from vCenter Server 8.x. If there is an issue copying the script to vCenter it may be necessary to change the shell to bash on vCenter.
    Set the default shell on the vCenter Server to bash via the following command to work around this: chsh -s /bin/bash root
  • A log named carr.log is created in the folder where the start.sh script is located. For any issues requiring support, please collect this log separately as it will not be collected as part of the support bundle.
  • Expired/expiring certificates which are not in use will not be processed by the script. These can be manually deleted in the NSX UI.
  • The script will process the expired certificates, but with regards to expiring certificates, only certificates expiring within the next 31 days will be processed, by default, version 1.11 and above allows you to specify a lead time between 31 and 365 days (from v1.13 onwards, between 31 and 825 days), using the '-t' options e.g. ./start -t 100 (to check for certificates expiring in the next 100 days). 
  • The script processes self-signed certificates only. CA signed certificates are out of scope and must be managed by the organization owners.
  • By default the script will run in offline mode, if the appliance has internet connection, you can use the -o option (version 1.11 onwards) to force the script to check online for dependancies.
  • Note v1.11 had an issue when executed on Global Managers in a Federated environment. This version should no longer be used, the issue was fixed from v1.12.

Execution Steps:

  1. Copy carr-1.15.tar.gz to the client server where it will be run. On the NSX Manager use the /root folder
  2. Extract the bundle
    tar -xvf carr-1.15.tar.gz
  3. Change to the extracted folder
    cd carr-1.15
  4. Launch the carr script
    ./start.sh

Script options since version 1.11:
-o = this flag is used to force online mode
-t = specify lead time for expiring certificates, between 31 and 825 days.

Uninstall:

CARR script gets installed in the directory ~/.virtualenvs/carr_script.
For example, when running CARR script on an NSX Manager, the install can be reversed as follows

   rm -rf /root/.virtualenvs/carr_script


Note: This rm command deletes files recursively without checks. If executed incorrectly it can remove system files irreversibly requiring the NSX appliance to be replaced.

Compute Manager Certificate

Starting from version 1.15, the CARR script retrieves the list of Compute Managers registered in NSX Manager, retrieves the vCenter certificates and checks their thumbprints and chain order.
If the CRL Distribution Point field is present in the vCenter certificates, the script disables the Certificate Revocation List (CRL) checking in NSX.
If there is a mismatch with the vCenter thumbprints, it updates the new thumbprints in NSX.


Transport Node Certificates

On versions NSX 4.1.x and 4.2.0, Edge and Host Transport Nodes are instantiated using a certificate with validity period of 825 days instead of 10 years. 
These are permanent certificates that are not replaced by upgrades. 
Starting from version 1.15, CARR script replaces these certificates with new certs of 10 year validity period.
Note: If TN certificates have already expired and the 24 hour grace period has elapsed, TNs will be disconnected. At this point CARR can no longer be used to replace the TN certs.
          See Transport Node Certificate Has Expired.

Dry Run:
Dry run is read only execution that will identify the number of Edges and Hosts with TN certificates of validity 825 days or less.

> start.sh -d

<snip>
+-------------------------+--------------------------------------------------------------+---------------------------------------------------------+
| HOST                    | ERROR  : vcsa.example.com::ESX_Cluster1:: Certificate on     | Host certificate on #8 hosts will be replaced.          |
|                         | #8 hosts are expiring or have expired                        |                                                         |
|                         |                                                              |                                                         |
+-------------------------+--------------------------------------------------------------+---------------------------------------------------------+
| EDGE                    | ERROR  : EdgeCluster-1 :: Certificate on #6 hosts are        | Edge node certificate on 6 nodes will be replaced.      |
|                         | expiring or have expired                                     |                                                         |
|                         |                                                              |                                                         |
+-------------------------+--------------------------------------------------------------+---------------------------------------------------------+


TN Cert replacement:
To trigger TN cert replacement, environment details must be populated in a pre-existing file validation_config.yaml. This yaml file is located in the same folder as start.sh.
On the Manager the file can be edited using vi editor, alternatively SCP the file out and edit it with Notepad++ and copy it back to the Manager.

To replace certificates on Hosts, the Compute Manager name must be specified and the vSphere cluster names that should be processed.
To replace certs on Edges, the Edge cluster name must be specified.
During certificate replacement, it's possible vMotion to the Host may not be possible.
It's recommended to start with one cluster and validate functionality.
Existing datapath flows through the Edge and Host are expected not to experience disruption.

e.g.

HOST:
  validate: True
  clusters:
    - vcenter_name: vcsa.example.com
      vcenter_cluster_name: ESX_Cluster1
    - vcenter_name: vcsa.example.com
      vcenter_cluster_name: ESX_Cluster2
EDGE:
  validate: True
  clusters:
    - name: EdgeCluster-1
    - name: EdgeCluster-2

Note, currently only Edges in clusters are processed, standalone Edges are ignored.

After saving this file run CARR to replace TN certs

> start.sh -t 825   (The lead time is tuneable, in this example all Certs that expire in 825 days or less will be replaced with 10 year certs)

Additional Information

See Create a virtual machine for running the Certificate Analyzer, Results and Recovery (CARR) Script for detailed instructions on creating a Photon OS VM as a location to run the CARR script if no suitable location exists in your environment.

If the suggested resolution steps do not resolve the issue, please consider submitting a support case to Broadcom. Kindly include the error screenshot or details, along with NSX manager log files and script log file (A log named carr.log is created in the folder where the start.sh script is located.) 

Attachments

carr-1.15.tar.gz get_app