NSX alarms indicating certificates have expired or are expiring
search cancel

NSX alarms indicating certificates have expired or are expiring

book

Article ID: 324175

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Symptoms:
  • The environment runs NSX 4.1.0.2 or above, and was upgraded from NSX-T 3.2.x.
  • NSX Alarms indicate certificates are expired or about to expire.
  • The expiring certificates contain "Corfu Client" in their name.

Environment

VMware NSX 4.1.x and above

Cause

 
There are two main factors that can contribute to this behavior:
  • NSX Managers have many certificates for internal services.
    In NSX-T 3.2.x, Cluster Boot Manager (CBM) service certificates were incorrectly given a validity period of 825 days instead of 100 years.
    This was corrected to 100 years in NSX-T 3.2.3 and NSX 4.1.0.
    However, any environment previously running NSX-T 3.2.x (below 3.2.3) will have the internal CBM Corfu certificates expire after 825 regardless of upgrade to the fixed version or not.
  • On NSX-T 3.2.x internal server certificates could expire, and no alarm would trigger. There was no functional impact.
    Starting from NSX 4.1.0.2, NSX alarms now monitor validity of internal certificates and will trigger for expired or soon to expire certificates.

Note: In NSX 4.1.x, there is no functional impact when an internal certificate expires, however alarms will continue to trigger.

Resolution

Scripted Resolution


VMware have developed a script that will replace the certificates automatically and resolve this issue.

Please read usage of the script: 

  • The script is compatible with NSX version 4.1.0 and above.
  • The script can be run on both Federation (GM/LM) and non-Federation environments.
  • An NSX backup must be taken before running the script. Also, ensure the passphrase is known.
  • The script replaces the following certificates: API, MGMT_CLUSTER, APH_TN, APH (AR), LOCAL_MANAGER (on LM), GLOBAL_MANAGER (on GM), CBM_CLUSTER_MANAGER, CBM_CORFU and CCP.
  • The script will only replace self-signed RSA certificates.
  • The script will only replace the certificates that are expired or expiring in the next 31 days. The script does NOT attempt to replace or touch other certificates.
  • If desired, the script can be editted and LEAD_DAYS customised to consider certificates that have longer than 31 days of validation.
  • The validity period of the certificates generated by this script is 100 years for CBM certificates, and 825 days for others. See the "Expiry time" column in Overview of X509 certificate key-pairs used by the NSX Manager.
  • When replacing a certificate, the script copies all the metadata from the existing certificate, then generates a new certificate with the above expiry time.
  • The script supports all keysizes supported by the NSX product.
  • EC certificates are NOT currently supported. Support for EC certificates will be added in future.
  • This is a python version 3 script which must be run from a client machine which has paramiko and cryptography python packages installed.
  • Depending on the system, these packages may be installed with a command such as:
    # sudo pip3 install cryptography
  • These packages are already installed on VCSA (vCenter Server Appliance), hence this can be used as a client machine to execute the script.
  • The script cannot be run directly on the NSX Manager, as it does not have the required python modules. It is not supported to install it on the NSX Manager.
  • The script is also supported to be run from a Windows machine.
  • Communication to the NSX Manager VIP/IP on port TCP 443 (HTTPS) is required.
  • Communication to the NSX Manager VIP/IP on port TCP 22 (SSH) is also required if running NSX 4.1.1.

Steps:

  1. Download the attached script replace_certs_v1.7.py.
  2. To execute the script, run the following command and follow the prompts:
    # python3 replace_certs_v1.7.py
  3. You will need to input the NSX Manager cluster IP and admin credentials at the relevant prompts.
  4. In some environments, it may be necessary to increase the timeout value used by the script to allow the script to complete successfully. long_wait_time defaults to a value of 150 but can be increased to 180 (or higher) and then re-run the script.

 

Note for NSX 4.1.1 and later: The script doesn’t rotate CBM_API certificates (API-Corfu Client certificate) as these are deprecated in 4.1.1. Please refer to the Resolution of KB#367857
After completion of the script, some unused certificates may remain in the UI with the column “Where Used” set to 0. These may be removed if desired.

Note for VCenter 8.0 U3:  This script may fail on vCenter 8.0 U3 due to incompatibility issues. This may fail with the error:

SshCommandExecutor: An error occurred: [digital envelope routines] unsupported
Unable to SSH to '192.168.x.x'. Please fix it and rerun the script

Resolution: Use a different version of vCenter or utilize a different Linux device with the necessary components to execute the script.

 

If the script does not work in your environment, please contact Support by opening a service request and referencing this KB article.



Additional Information

Changelog:

12th July 2024: replace_certs_v1.1.py script replaced with replace_certs_v1.7.py 

 

 

Attachments

replace_certs_v1.7.py get_app