One of the NSX Manager nodes becomes unavailable after a reboot while the upgrade is in progress.
search cancel

One of the NSX Manager nodes becomes unavailable after a reboot while the upgrade is in progress.

book

Article ID: 373464

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

When rebooting one of the NSX manager nodes in case the upgrade is not fully completed, the node might not be able to start cluster boot manager(CBM) service.

From the output of "get cluster status" command, the status of the faulty node is DOWN under "Group Type" of CLUSTER_BOOT_MANAGER.

Group Type: CLUSTER_BOOT_MANAGER
Group Status: DEGRADED

Members:
    UUID                                       FQDN                                       IP               STATUS
    <MANAGER_NODE_UUID>       <HOSTNAME>           <IP_ADDRESS>     UP
    <MANAGER_NODE_UUID>       <HOSTNAME>           <IP_ADDRESS>     DOWN
   <MANAGER_NODE_UUID>       <HOSTNAME>            <IP_ADDRESS>     UP

And you can observe the below failure from var/log/upgrade-coordinator/upgrade-coordinator.log.

<DATE_TIME>  INFO pool-17-thread-2 SelfSignedTrustArtifactory 3590 - [nsx@6876 comp="nsx-manager" level="INFO" s2comp="cert" subcomp="cbm"] Cannot add certificates since entity [ccp] does not use CBM managed certificate/truststore or public certificate update on disk is not needed for service
<DATE_TIME>  WARN pool-17-thread-2 Step 3590 - [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="cbm"] javax.net.ssl.SSLException: java.security.cert.CertificateException: Unexpected data detected in stream
        at com.vmware.nsx.cbm.cert.CertUtils.addCertificate(CertUtils.java:149)
        at com.vmware.nsx.cbm.cert.CertUtils.addCertificate(CertUtils.java:125)
        at com.vmware.nsx.cbm.cert.impl.SelfSignedTrustArtifactory.addAllCertificates(SelfSignedTrustArtifactory.java:83)
        at com.vmware.nsx.cbm.periodicservice.impl.PeriodicSyncTask.addOrDeleteTrustStore(PeriodicSyncTask.java:728)
        at com.vmware.nsx.cbm.periodicservice.impl.PeriodicSyncTask.updateTrustStore(PeriodicSyncTask.java:670)
        at com.vmware.nsx.cbm.periodicservice.impl.PeriodicSyncTask$UpdateTrustStore.executeStep(PeriodicSyncTask.java:238)
        at com.vmware.nsx.cbm.tasks.Task.executeTask(Task.java:353)
        at com.vmware.nsx.cbm.tasks.Task.executeTaskWithCheck(Task.java:324)
        at com.vmware.nsx.cbm.tasks.Task.call(Task.java:300)
        at com.vmware.nsx.cbm.tasks.Task.call(Task.java:47)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.security.cert.CertificateException: Unexpected data detected in stream
        at org.bouncycastle.jcajce.provider.CertificateFactory.engineGenerateCertificate(Unknown Source)
        at java.base/java.security.cert.CertificateFactory.generateCertificate(Unknown Source)
        at com.vmware.nsx.cbm.cert.CertUtils.addCertificate(CertUtils.java:146)
        ... 14 more

 

Environment

VMware NSX-T Data Center 3.2.3

Cause

As the api certificates in 4.1.1 are made null, this blocks a mixed mode cluster where cluster contains nodes with both pre and post 4.1.1 versions.

Resolution

This issue is addressed in NSX 4.2.0.