This script is intended to be used to resolve certificate management issues on NSX 3.2.x, 4.0.x and 4.1.x.
The script will make an assessment of all certificates requiring remediation, present the proposed changes and ask for approval to proceed.
tar -xvf carr-1.21.tar.gzcd carr-1.21./start.sh -d ./start.sh There should be no impact associated with running the CARR script, but Broadcom recommends running the script during a maintenance window.
See Additional Information section for detailed information on CARR.
Python version requirements:
python --versionOS: MAC and Linux
Architecture - (if the appliance has an internet connection, then there is no restriction, dependencies are downloaded)
/root directory, it will not work from the /tmp directory.carr.log is created in the folder where the start.sh script is located. For any issues requiring support, please collect this log separately, it will not be collected as part of the support bundle../start -t 100 (to check for certificates expiring in the next 100 days). Other notes:
Script option:
-o = this flag is used to force online mode-t = specify lead time for expiring certificates, between 31 and 825 days.-d = Dry run mode, also checks for transport node certificates expiring.On versions NSX 4.1.x and 4.2.0, Edge and Host Transport Nodes are instantiated using a certificate with validity period of 825 days instead of 10 years.
These are permanent certificates that are not replaced by upgrades.
Starting from version 1.15, CARR script replaces these certificates with new certs of 10 year validity period.
On NSX 4.2.3, it adds an upgrade pre-check to validate that no transport node SSL certificates are expired or will expire within 90 days. If such a certificate is found, the user will be instructed to run the Certificate Analyzer, Results and Recovery (CARR) script. In such cases, the CARR script must be run in dry run mode and then apply the fix. NSX Manager Pre-Check warning to run CARR script
Note: If TN certificates have already expired and the 24 hour grace period has elapsed, TN's will be disconnected. At this point CARR can no longer be used to replace the TN certs.
See Transport Node Certificate Has Expired.
If a VM is vMotioned to the ESX host at the moment the certificate is being replaced, there is a possibility that it may fail to get a network connection.
To prevent vMotion during this time, it is recommended to disable DRS on the vSphere cluster for the duration of the activity.
To specifically select a subset of Hosts or Edges for remediate, after running in dry run reference dry_run_transport_nodes_validation_report.yaml and copy the relevant Edge/Host entries and add them to validation_config_recovery.yaml.
Relevant files
README - How to use script details
start.sh - carr script
carr.log - audit log generated during carr operation
validation_config.yaml - file for transport node validation, if not using the auto generated file validation_config_recovery_mode.yaml, this will referenced, this file needs to be manually populated.
validation_config_recovery_mode.yaml - Auto generated, populates which transport nodes need resolving and other certificates which need resolving.
before_recovery_transport_nodes_validation_report.yaml - Pre recovery file, which lists details about transport nodes certificates.
after_recovery_transport_nodes_validation_report.yaml - Post recovery file, which lists details about transport nodes certificates.
dry_run_transport_nodes_validation_report.yaml - Detailed list of transport nodes with certificate or connection issues.
Errors that may be seen if editing the yaml file manually:
ERROR : string indices must be integers. This is due to the yaml file syntax issue. To resolve it, when you edit the validation_config.yaml file, make sure to add a space between keys and values. For eg; - vcenter_name: vcsa-01.example.comERROR: Edge-cluster-01:: There are 1 edge_nodes. Certificates on these Edge Nodes will not be replaced. To resolve the issue, check if there are any edge node that are in powered off or disconnected state in the cluster. To resolve the issue, power on the edge node.Starting from version 1.15, the CARR script retrieves the list of Compute Managers registered in NSX Manager, retrieves the vCenter certificates and checks their thumbprints and chain order.
If the CRL Distribution Point field is present in the vCenter certificates, the script disables the Certificate Revocation List (CRL) checking in NSX.
If there is a mismatch with the vCenter thumbprints, it updates the new thumbprints in NSX.
CARR script gets installed in the directory ~/.virtualenvs/carr_script.
For example, when running CARR script on an NSX Manager, the install can be reversed as follows
rm -rf /root/.virtualenvs/carr_script
Note: This rm command deletes files recursively without checks. If executed incorrectly it can remove system files irreversibly requiring the NSX appliance to be replaced.
To use the CARR script in automated mode, please review the following KB: Using the CARR (Certificate Analyzer, Results and Recovery) script in automated mode