NSX Manager/NSX Global-Manager UI Not Accessible After Replacing CBM_* Certificates in 4.1.1
search cancel

NSX Manager/NSX Global-Manager UI Not Accessible After Replacing CBM_* Certificates in 4.1.1

book

Article ID: 314163

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

This article provides the steps to bring up UI on one NSX manager node. Once the UI becomes accessible and file permissions are fixed, then a new certificate can be generated via UI and expired certificate can be replaced via Apply Certificate API.

Symptoms:

Certificate Name and Service Type Mapping

NSX Manager Certificate Documentation For Reference: https://docs.vmware.com/en/VMware-NSX/4.1/administration/GUID-3DD19193-770C-47F3-A0F3-7B7703F274C8.html 

Certificate Name
Service Type
API-Corfu Client service_type=CBM_API
AR-Corfu Client service_type=CBM_AR
CCP-Corfu Client service_type=CBM_CCP
Cluster Manager-Corfu service_type=CBM_CLUSTER_MANAGER
CM Inventory-Corfu Client service_type=CBM_CM_INVENTORY
Corfu Server service_type=CBM_CORFU
IDPS reporting-Corfu Client service_type=CBM_IDPS_REPORTING
Messaging Manager-Corfu Client service_type=CBM_MESSAGING_MANAGER
Monitoring-Corfu Client service_type=CBM_MONITORING
MP-Corfu Client service_type=CBM_MP
Site Manager-Corfu Client service_type=CBM_AR
Upgrade Coordinator-Corfu Client service_type=CBM_UPGRADE_COORDINATOR
GM-Corfu Client service_type=CBM_GM
  • After replacing some CBM_* certificates on the NSX manager or NSX global-manager nodes, UI is NOT accessible with any of the manager node IPs. 
  • After replacing CBM_MP or CBM_AR or CBM_GM certificate for all the 3 NSX manager or NSX global-manager nodes, the corresponding service is DOWN on all the 3 NSX manager or NSX global-manager nodes.
  • For the example below, this was the "get cluster status" CLI output after replacing the CBM_MP certificate on all 3 NSX manager nodes.  
    Cluster status
    Group Type: MANAGER
    Group Status: UNAVAILABLE
    
    Members:
           UUID              FQDN                  IP      IPv6       STATUS                                   
        <UUID_MGR1>       <FQDN_MGR1>           <IP_MGR1>   -          DOWN                                      
        <UUID_MGR2>       <FQDN_MGR2>           <IP_MGR2>   -          DOWN                                      
        <UUID_MGR3>       <FQDN_MGR3>           <IP_MGR3>   -          DOWN                                     
    
    Group Type: HTTPS
    Group Status: UNAVAILABLE
    
    Members:
           UUID              FQDN                  IP      IPv6       STATUS                                     
        <UUID_MGR1>       <FQDN_MGR1>           <IP_MGR1>   -          DOWN                          
        <UUID_MGR2>       <FQDN_MGR2>           <IP_MGR2>   -          DOWN                                      
        <UUID_MGR3>       <FQDN_MGR3>           <IP_MGR3>   -          DOWN     
                                     
  • /var/log/cbm/cbm.log show that certificate replacement operation failed while replacing the private key due to FileNotFoundException as shown in the following example:

2023-09-16T19:10:58.975Z WARN pool-18-thread-4 Step 83803 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="cbm"] javax.net.ssl.SSLException: java.io.FileNotFoundException: File '/config/cluster-manager/mp/private/keystore.password' does not exist
        at com.vmware.nsx.cbm.cert.CertUtils.readFromFile(CertUtils.java:73)
        at com.vmware.nsx.cbm.cert.impl.SelfSignedTrustArtifactory.replaceCertificatesOnDisk(SelfSignedTrustArtifactory.java:180)
        at com.vmware.nsx.cbm.tasks.impl.ReplaceCertificatesTask$ReplaceCertificatesOnDisk.executeStep(ReplaceCertificatesTask.java:261)
        at com.vmware.nsx.cbm.tasks.Task.executeTask(Task.java:329)
        at com.vmware.nsx.cbm.tasks.Task.executeTaskWithCheck(Task.java:300)
        at com.vmware.nsx.cbm.tasks.Task.call(Task.java:280)
        at com.vmware.nsx.cbm.tasks.Task.call(Task.java:46)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.FileNotFoundException: File '/config/cluster-manager/mp/private/keystore.password' does not exist
        at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:2368)
        at org.apache.commons.io.FileUtils.readFileToString(FileUtils.java:2486)
        at com.vmware.nsx.cbm.cert.CertUtils.readFromFile(CertUtils.java:71)
        ... 12 more

2023-09-16T19:10:58.975Z ERROR pool-18-thread-4 Task 83803 - [nsx@6876 comp="nsx-manager" errorCode="CBM41" level="ERROR" subcomp="cbm"] Step ReplaceCertificatesOnDisk (6/7) failed for Task com.vmware.nsx.cbm.tasks.impl.ReplaceCertificatesTask: java.io.FileNotFoundException: File '/config/cluster-manager/mp/private/keystore.password' does not exist
2023-09-16T19:10:58.975Z ERROR pool-18-thread-4 Task 83803 - [nsx@6876 comp="nsx-manager" errorCode="CBM411" level="ERROR" subcomp="cbm"] [CBM411] Error occurred while replacing certificates in private keyStores.
javax.net.ssl.SSLException: java.io.FileNotFoundException: File '/config/cluster-manager/mp/private/keystore.password' does not exist

2023-09-16T19:10:59.074Z ERROR CertificateStreamListener-1-1 CertificateStreamListener 83803 - [nsx@6876 comp="nsx-manager" errorCode="CBM100" level="ERROR" subcomp="cbm"] ReplaceCertificatesTask error: Optional[[CBM411] Error occurred while replacing certificates in private keyStores.], task status: FAILED.

 

  • Checking the permissions on the filesystem of all the NSX Manager nodes for /config/cluster-manager/<service>/private/ shows that the permissions are not set to 770 (-rwxrwx---) as needed.
    Relevant folders: ar, ccp, cluster-manager, cm-inventory, idps-reporting, messaging-manager, monitoring, mp, site-manager, upgrade-coordinator.
    For Global Manager: gm
    For CSM: csm

    Bad State (irrelevant folders were ommitted):
# ls -l /config/cluster-manager/*/private/
/config/cluster-manager/ar/private/:
total 8
-rw------- 1 nsx-replicator nsx-replicator 2051 May  4  2021 keystore.jks
-rw------- 1 nsx-replicator nsx-replicator   44 May  4  2021 keystore.password

/config/cluster-manager/ccp/private/:
total 8
-rw------- 1 nsx nsx 2050 May  4  2021 keystore.jks
-rw------- 1 nsx nsx   44 May  4  2021 keystore.password

/config/cluster-manager/cluster-manager/private/:
total 8
-rw------- 1 nsx-cbm nsx-cbm 2076 May  4  2021 keystore.jks
-rw------- 1 nsx-cbm nsx-cbm   44 May  4  2021 keystore.password

/config/cluster-manager/cm-inventory/private/:
total 8
-rw------- 1 ucminv ucminv 2071 Jul 27  2022 keystore.jks
-rw------- 1 ucminv ucminv   44 Jul 27  2022 keystore.password

/config/cluster-manager/idps-reporting/private/:
total 8
-rw------- 1 nsx-idps nsx-idps 2077 May  4  2021 keystore.jks
-rw------- 1 nsx-idps nsx-idps   44 May  4  2021 keystore.password

/config/cluster-manager/messaging-manager/private/:
total 8
-rw------- 1 nsx-messaging nsx-messaging 2079 Jul 27  2022 keystore.jks
-rw------- 1 nsx-messaging nsx-messaging   44 Jul 27  2022 keystore.password

/config/cluster-manager/monitoring/private/:
total 8
-rw------- 1 uphc uphc 2067 May  4  2021 keystore.jks
-rw------- 1 uphc uphc   44 May  4  2021 keystore.password

/config/cluster-manager/mp/private/:
total 8
-rw------- 1 uproton uproton 2052 May  4  2021 keystore.jks
-rw------- 1 uproton uproton   44 May  4  2021 keystore.password

/config/cluster-manager/site-manager/private/:
total 8
-rwxrwx--- 1 nsx-sm nsx-sm 2073 Sep 16 15:20 keystore.jks
-rwxrwx--- 1 nsx-sm nsx-sm   44 Sep 16 15:20 keystore.password

/config/cluster-manager/upgrade-coordinator/private/:
total 8
-rw------- 1 uuc uuc 2085 Jul 27  2022 keystore.jks
-rw------- 1 uuc uuc   44 Jul 27  2022 keystore.password

Environment

VMware NSX-T Data Center

Cause

After upgrade from 3.2.x to 4.1.1, "nsx-cbm" linux user should be part of the service linux group and should have write permissions on the private keystore files of that service. But, the file permissions were not modified after upgrade from 3.2.x to 4.1.1 due to a bug in CBM's init script. So without updating the permissions of private keystore files, CBM fails to replace the CBM_* certificate private key for a service in 4.1.1.

Resolution

Resolved in NSX release 4.1.2 and above.


Workaround:
If you cannot upgrade to NSX 4.1.2 (or above), please contact VMware NSX GSS by opening a service request and referencing this KB article.

Additional Information

Impact/Risks:
The CBM_<service> that has had its certificates replaced may be unable to connect to the CorfuDB. This can have varying impact and may result in the UI/API being inaccessible in the case of CBM_MP certificates having been replaced prior to permissions being fixed.

This issue has been found in environments upgraded to 4.1.1 from 3.2.x.
Greenfield environments deployed with NSX 4.1.1 or brownfield 4.0.x environments upgraded to 4.1.1 should not be impacted.  

Found In: NSX 4.1.1