This script is intended to be used to resolve certificate management issues on NSX 3.2.x and 4.x. It performs integrity checks and recovery operations for NSX self-signed certificates, and can replace certificates that have expired or will be expiring soon.
VMware NSX 4.x
VMware NSX-T Data Center 3.2.x
The script will make an assessment of certificate remediation needed, present the proposed changes and ask for approval to proceed.
Client Requirements:
Python version requirements
OS : MAC and Linux
Architecture - (if internet connection is there then there is no restriction, dependencies are downloaded)
Execution Notes:
/root
directory, it will not work from the /tmp
directorychsh -s /bin/bash root
carr.log
is created in the folder where the start.sh
script is located. For any issues requiring support, please collect this log separately as it will not be collected as part of the support bundle../start -t 100
(to check for certificates expiring in the next 100 days). Execution Steps:
/root
foldertar -xvf carr-1.15.tar.gz
cd carr-1.15
./start.sh
Script options since version 1.11:-o
= this flag is used to force online mode-t
= specify lead time for expiring certificates, between 31 and 825 days.
Uninstall:
CARR script gets installed in the directory ~/.virtualenvs/carr_script
.
For example, when running CARR script on an NSX Manager, the install can be reversed as follows
rm -rf /root/.virtualenvs/carr_script
Note: This rm
command deletes files recursively without checks. If executed incorrectly it can remove system files irreversibly requiring the NSX appliance to be replaced.
Compute Manager Certificate
Starting from version 1.15, the CARR script retrieves the list of Compute Managers registered in NSX Manager, retrieves the vCenter certificates and checks their thumbprints and chain order.
If the CRL Distribution Point field is present in the vCenter certificates, the script disables the Certificate Revocation List (CRL) checking in NSX.
If there is a mismatch with the vCenter thumbprints, it updates the new thumbprints in NSX.
On versions NSX 4.1.x and 4.2.0, Edge and Host Transport Nodes are instantiated using a certificate with validity period of 825 days instead of 10 years.
These are permanent certificates that are not replaced by upgrades.
Starting from version 1.15, CARR script replaces these certificates with new certs of 10 year validity period.
Note: If TN certificates have already expired and the 24 hour grace period has elapsed, TNs will be disconnected. At this point CARR can no longer be used to replace the TN certs.
See Transport Node Certificate Has Expired.
Dry Run:
Dry run is read only execution that will identify the number of Edges and Hosts with TN certificates of validity 825 days or less.
> start.sh -d
<snip>
+-------------------------+--------------------------------------------------------------+---------------------------------------------------------+
| HOST | ERROR : vcsa.example.com::ESX_Cluster1:: Certificate on | Host certificate on #8 hosts will be replaced. |
| | #8 hosts are expiring or have expired | |
| | | |
+-------------------------+--------------------------------------------------------------+---------------------------------------------------------+
| EDGE | ERROR : EdgeCluster-1 :: Certificate on #6 hosts are | Edge node certificate on 6 nodes will be replaced. |
| | expiring or have expired | |
| | | |
+-------------------------+--------------------------------------------------------------+---------------------------------------------------------+
TN Cert replacement:
To trigger TN cert replacement, environment details must be populated in a pre-existing file validation_config.yaml. This yaml file is located in the same folder as start.sh.
On the Manager the file can be edited using vi editor, alternatively SCP the file out and edit it with Notepad++ and copy it back to the Manager.
To replace certificates on Hosts, the Compute Manager name must be specified and the vSphere cluster names that should be processed.
To replace certs on Edges, the Edge cluster name must be specified.
During certificate replacement, it's possible vMotion to the Host may not be possible.
It's recommended to start with one cluster and validate functionality.
Existing datapath flows through the Edge and Host are expected not to experience disruption.
e.g.
HOST:
validate: True
clusters:
- vcenter_name: vcsa.example.com
vcenter_cluster_name: ESX_Cluster1
- vcenter_name: vcsa.example.com
vcenter_cluster_name: ESX_Cluster2
EDGE:
validate: True
clusters:
- name: EdgeCluster-1
- name: EdgeCluster-2
Note, currently only Edges in clusters are processed, standalone Edges are ignored.
After saving this file run CARR to replace TN certs
> start.sh -t 825 (The lead time is tuneable, in this example all Certs that expire in 825 days or less will be replaced with 10 year certs)
See Create a virtual machine for running the Certificate Analyzer, Results and Recovery (CARR) Script for detailed instructions on creating a Photon OS VM as a location to run the CARR script if no suitable location exists in your environment.
If the suggested resolution steps do not resolve the issue, please consider submitting a support case to Broadcom. Kindly include the error screenshot or details, along with NSX manager log files and script log file (A log named carr.log
is created in the folder where the start.sh
script is located.)