Removing expired Local Manager certificate being reported as used on NSX UI.
search cancel

Removing expired Local Manager certificate being reported as used on NSX UI.

book

Article ID: 420453

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • This is an NSX Federation configured environment with multiple local sites.
  • On the Global Manager UI and the configured Local Managers, under System -> Certificates, there is an expired Local Manager certificate.
  • This certificate is used by "CLIENT_AUTH" and "Local_Manager" services and the "Location/Node" reports a local NSX site pushing the certificate.

  • Searching for the local site UUID on the Global manager (GM) inventory does not fetch any details. And under the list of all local sites configured, there is no site with the name as reported from above certificate.

Cluster Info:  NodeType: NodeType.LM Version: Version: LocalManagerSiteVersion Major: 3 Minor: 2 patch: 1 is_local: False
site id: <Site UUID from above> site-name: <Site name from above>
nodes:
Id: <UUID> IP: 1xx.xx.xxx.xx3 aph_id: <Node_UUID> is_local: False
Id: <UUID> IP: 1xx.xx.xxx.xx3 aph_id: <Node_UUID> is_local: False
Id: <UUID> IP: 1xx.xx.xxx.xx4 aph_id: <Node_UUID> is_local: False

  • Following events are also present in the Global manager syslog file regarding the above site UUID: /var/log/syslog

YYYY-MM-DDTHH:MM:SS.290Z <GlobalManager> NSX 77655 SYSTEM [nsx@6876 comp="global-manager" level="WARNING" subcomp="async-replicator"] getLeader call failed for remote site <LM Site UUID from above> with aph <LM Site Node 1>.
YYYY-MM-DDTHH:MM:SS.290Z <GlobalManager> NSX 77655 SYSTEM [nsx@6876 comp="global-manager" level="WARNING" subcomp="async-replicator"] getLeader call failed for remote site <LM Site UUID from above> with aph <LM Site Node 2>.
YYYY-MM-DDTHH:MM:SS.290Z <GlobalManager> NSX 77655 SYSTEM [nsx@6876 comp="global-manager" level="WARNING" subcomp="async-replicator"] getLeader call failed for remote site <LM Site UUID from above> with aph <LM Site Node 3>.

  • Stale site details were also found after running the API call: GET https://<Global_Manager_IP>/policy/api/v1/infra/sites

{
    "sites": [
        {
            "name": "<Site name from above>" ,
            "site version": "Local_Site_Version",
            "id": "<Site UUID from above>",
            "is federated": true,
            "is local": false,
            "system_id":
            "active_gm" : "NONE
            "aph list": [
                {
                    "uuid": "UUID",
                    "node id": "<LM Site Node 1 UUID>",
                    "address": "Node 1 IP",
                    "ipv6_address": "",
                    "fqdn": "Node 1 FQDN"
                    "use_fqdn": false,
                    "port": 1236,
                    "certificate": ""
                }
                ...
            ],
            "node _ type": "LM" ,
            "trust_manager_cert": "omitted" ,
            "cert hash": "HASH_Value",
            "config_version": 7,
            "split_brain": false,
            "vip_ip": "VIP_IP",
            "cluster id": "Cluster_UUID",
            "export_type" :
        ]

Environment

VMware NSX

Cause

The site was previously off-boarded via the UI, but the removal process was incomplete, leaving the site ID and its associated certificates still in the federation database.

This caused the stale certificates to persist in the Global/Local managers and trigger the alerts when it expired.

Resolution

The decommissioned site details need to be removed from the Global manager nodes, after which the stale certificates pushed from the site should get deleted automatically from the Global and Local NSX sites.

For the proper cleanup of the decommissioned site, please open a support request with Broadcom Support Team. Please refer: Creating and managing Broadcom support request (SR) cases

Additional Information

Please refer: Procedure for Removing a Site from an NSX Federated Deployment, for details regarding removal of Local site from NSX Federation.

For detailed information regarding running the CARR script for certificate related issues, please refer: Using Certificate Analyzer, Results and Recovery (CARR) Script to fix certificate related issues in NSX