CARR script run on Global NSX Manager does not replace certificates on Local Manager sites as expected
search cancel

CARR script run on Global NSX Manager does not replace certificates on Local Manager sites as expected

book

Article ID: 442260

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • After running the Certificate Analyzer, Results and Recovery (CARR) script on a Global Manager (GM) appliance, several certificates (often API, VIP, APH, or LOCAL_MANAGER type) still show as expired or expiring on Local Manager (LM) sites.

  • The NSX UI under System > Certificates on the GM may show all alerts cleared, yet communication issues or certificate alarms may persist on individual LM clusters.

  • Some certificates that are expected to update don't show up in the output of Issues to Resolve when running the CARR dry run on the GM. After running it, the ones that were listed get updated and CARR shows successful completion but expired or expiring Self-signed certificates with 'Where used' values > 0 are still present. 

Environment

Federated NSX environment

Cause

In an NSX Federation environment, the Global Manager maintains a remote reference to Local Manager certificates via Principal Identities (PI). These remote entries typically do not contain the private key on the GM. Because the CARR script's discovery logic focuses on certificates where the private key is held locally, it may skip these remote references during a GM-only run.

Furthermore, the NSX UI may not surface localized truststore inconsistencies or site-specific certificate integrity issues that are only visible when the script performs its mandatory integrity checks directly on the Local Manager cluster.

Resolution

  1. Execute CARR on the Global Manager (GM): Follow the standard process to remediate certificates local to the GM.

  2. Execute CARR on one node of EACH Local Manager (LM) site 
    • Even if the GM UI indicates that certificates are healthy, running CARR locally on each site is often necessary because:
      • It allows the script to "gather" certificates where the private key is local to that LM.
      • It performs deep integrity checks on the local LM truststore that are not reflected in the GM UI.
      • It identifies and replaces APH, API, and VIP certificates that may be stale but still in use locally.

    • Follow the Mandatory Workflow for each site:
      • Run the dry run first: ./start.sh -d to generate the site-local validation_config_recovery_mode.yaml.
      • Review the dry run findings to confirm all expiring certificates are captured.
      • Execute the remediation: ./start.sh.

  3. Verify Synchronization: Once the LMs are remediated, verify that the new certificates have synchronized back to the Global Manager and other sites. It may take some time for data to be synchronized across NSX Manager nodes.

Additional Information

See also: