Using Certificate Analyzer, Results and Recovery (CARR) Script to fix certificate related issues in NSX
search cancel

Using Certificate Analyzer, Results and Recovery (CARR) Script to fix certificate related issues in NSX

book

Article ID: 369034

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

This script is intended to be used to resolve certificate management issues on NSX 3.2.x, 4.0.x and 4.1.x.

  • Update self signed certs.
  • For certificate operations and management on NSX 4.2.x, the user interface (System, Certificates) can now be used to manage and replace certificates and should be the preferred option, over using the CARR script attached to this KB. For further details, please review the section Replace Certificates Through NSX Manager in the NSX administration guide.
  • It performs integrity checks and recovery operations for NSX self-signed certificates, and can replace certificates that have expired or will be expiring soon.
  • Expired certificates with a value of 0 in the 'Used By' column, can be deleted in the UI, the script does not delete these.
  • It can replace certificates on the NSX Manager as well as on NSX Transport Nodes, Edges and Hosts.
  • CA signed certificates are out of scope and should be managed by your organization owners. If CA certs are using VMCA, see Scripted process to replace expired or self-signed VMware NSX Manager Certificates with VMCA-Signed Certificates.

Environment

  • VMware NSX-T Data Center 3.2.x
  • VMware NSX 4.x
  • VCF NSX 9.0.0

Resolution

The script will make an assessment of all certificates requiring remediation, present the proposed changes and ask for approval to proceed.

  1. Copy the script, attached to the bottom of this KB, to /root directory on any NSX Manager
  2. Extract the script
    tar -xvf carr-1.21.tar.gz
  3. Change to the extracted folder
    cd carr-1.21
  4. Run in dry run assessment mode first, this is a mandatory step. It generates a file validation_config_recovery_mode.yaml which is consumed by default in step 5
    ./start.sh -d 
  5. Remediate all certificates that will expire in 825 days or less
    ./start.sh 
  6. On the NSX UI, System > Certificates, manually delete all unused expiring and expired certs that CARR has replaced

There should be no impact associated with running the CARR script, but Broadcom recommends running the script during a maintenance window.
See Additional Information section for detailed information on CARR.

Additional Information

Client Requirements:

  • For NSX 3.2.3.x, 4.1.x and 4.2.x, the script can be run directly on a Local or Global NSX Manager from /root directory. 
  • For NSX versions less than 3.2.3.x and NSX 4.0.x an external client machine is required meeting the following requirements. See Create a virtual machine for running the Certificate Analyzer, Results and Recovery (CARR) Script if no suitable location exists in your environment.

    Python version requirements:

    • You can check the version of python with the following command: python --version
    • 3.13+ requires the client machine to have internet connectivity and cannot be run offline.
    • 3.8 to 3.12 can run with or without client internet connectivity.

    OS: MAC and Linux

    Architecture -  (if the appliance has an internet connection, then there is no restriction, dependencies are downloaded)

    • MAC : : "macosx_10_9_x86_64" "macosx_11_0_arm64"
    • Linux : : "musllinux_1_1_x86_64" "musllinux_1_1_aarch64" "manylinux_2_17_x86_64" "manylinux_2_17_s390x" "manylinux_2_17_aarch64" 

  • In all cases, the script requires the following ports to be open between the client machine and the 3 NSX Managers:
    • ssh port 22 (TCP)
    • https port 443 (TCP)
    • corfu port 9000 (TCP)
      Note: If running the carr script on the NSX Manager directly, ports 443 and 9000 will already be open between the 3 Managers.

  • ssh access via admin and root users must be enabled on all NSX Managers, if needed see Enable ssh root access for NSX appliances. In Federation environments, this requirement applies all LMs and GMs.

  • When running the CARR script in NSX 4.2.0.x and earlier, please make sure the admin and root password on NSX nodes only contain characters discussed in the KB article LDAP or local users with a special character in their username or password cannot login to NSX for details. Otherwise, login may fail. 

Execution Notes:

  • This script can be ran on any node and it will reach out to the respective NSX nodes in the correct order.
  • Ensure that you have a recent, valid backup of your NSX managers and ensure that you know the passphrase for your backups.
  • On NSX, the script should be run from the /root directory, it will not work from the /tmp directory.
  • admin and root passwords of NSX Manager are required as inputs.
  • The script can be run from vCenter Server 8.0u2 and above. If there is an issue copying the script to vCenter it may be necessary to change the shell to bash on vCenter. This can be performed by following the guidance in KB - Toggling the vCenter Server Appliance default shell ensuring the shell is reverted once the file has been copied.
  • A log named carr.log is created in the folder where the start.sh script is located. For any issues requiring support, please collect this log separately, it will not be collected as part of the support bundle.
  • Expired/expiring certificates which are not in use, will not be processed by the script, these can be manually deleted in the NSX UI.
  • The script will process the expired certificates, with regards to expiring certificates, only certificates expiring within the next 31 days will be processed, by default, unless you specify a lead time using the -t option, which can be between 31 and 825 days 
    • e.g. ./start -t 100 (to check for certificates expiring in the next 100 days). 
  • The script processes self-signed certificates only, CA signed certificates are out of scope and must be managed by the organization owners.
  • By default the script will run in offline mode, if the appliance has internet connection, you can use the -o option to force the script to check online for dependencies.

Other notes:

Script option:

  • -o = this flag is used to force online mode
  • -t = specify lead time for expiring certificates, between 31 and 825 days.
  • -d = Dry run mode, also checks for transport node certificates expiring.


Transport Node Certificates

On versions NSX 4.1.x and 4.2.0, Edge and Host Transport Nodes are instantiated using a certificate with validity period of 825 days instead of 10 years. 
These are permanent certificates that are not replaced by upgrades. 
Starting from version 1.15, CARR script replaces these certificates with new certs of 10 year validity period.

On NSX 4.2.3, it adds an upgrade pre-check to validate that no transport node SSL certificates are expired or will expire within 90 days. If such a certificate is found, the user will be instructed to run the Certificate Analyzer, Results and Recovery (CARR) script. In such cases, the CARR script must be run in dry run mode and then apply the fix. NSX Manager Pre-Check warning to run CARR script


Note: If TN certificates have already expired and the 24 hour grace period has elapsed, TN's will be disconnected. At this point CARR can no longer be used to replace the TN certs.
          See Transport Node Certificate Has Expired.

If a VM is vMotioned to the ESX host at the moment the certificate is being replaced, there is a possibility that it may fail to get a network connection.
To prevent vMotion during this time, it is recommended to disable DRS on the vSphere cluster for the duration of the activity.

To specifically select a subset of Hosts or Edges for remediate, after running in dry run reference dry_run_transport_nodes_validation_report.yaml and copy the relevant Edge/Host entries and add them to validation_config_recovery.yaml.

Relevant files

README - How to use script details
start.sh - carr script
carr.log - audit log generated during carr operation
validation_config.yaml - file for transport node validation, if not using the auto generated file validation_config_recovery_mode.yaml, this will referenced, this file needs to be manually populated.
validation_config_recovery_mode.yaml - Auto generated, populates which transport nodes need resolving and other certificates which need resolving.
before_recovery_transport_nodes_validation_report.yaml - Pre recovery file, which lists details about transport nodes certificates.
after_recovery_transport_nodes_validation_report.yaml - Post recovery file, which lists details about transport nodes certificates.
dry_run_transport_nodes_validation_report.yaml - Detailed list of transport nodes with certificate or connection issues.

Errors that may be seen if editing the yaml file manually:

  • When executing the Script, you may get ERROR  : string indices must be integers. This is due to the yaml file syntax issue. To resolve it, when you edit the validation_config.yaml file, make sure to add a space between keys and values.
    For eg; - vcenter_name: vcsa-01.example.com
  • When executing the Script, you may get ERROR: Edge-cluster-01:: There are 1 edge_nodes. Certificates on these Edge Nodes will not be replaced. To resolve the issue, check if there are any edge node that are in powered off or disconnected state in the cluster. To resolve the issue, power on the edge node.
  • It is recommended that APH and Transport node certificates should be replaced in separate run of the script. 

Compute Manager Certificate

Starting from version 1.15, the CARR script retrieves the list of Compute Managers registered in NSX Manager, retrieves the vCenter certificates and checks their thumbprints and chain order.
If the CRL Distribution Point field is present in the vCenter certificates, the script disables the Certificate Revocation List (CRL) checking in NSX.
If there is a mismatch with the vCenter thumbprints, it updates the new thumbprints in NSX.

Federation Certificates

Uninstall:

CARR script gets installed in the directory ~/.virtualenvs/carr_script.
For example, when running CARR script on an NSX Manager, the install can be reversed as follows

 rm -rf /root/.virtualenvs/carr_script


Note: This rm command deletes files recursively without checks. If executed incorrectly it can remove system files irreversibly requiring the NSX appliance to be replaced.

To use the CARR script in automated mode, please review the following KB: Using the CARR (Certificate Analyzer, Results and Recovery) script in automated mode

Attachments

carr-1.21.tar.gz get_app