After replace CORFU server certificates, new deployed nsx manager could not join manager cluster
search cancel

After replace CORFU server certificates, new deployed nsx manager could not join manager cluster

book

Article ID: 411292

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • CORFU server certificates have been replaced due to the following issue - NSX alarms indicating certificates have expired or are expiring
  • From NSX UI, Corfu server certificates are shown as replaced.
  • Use API GET /api/v1/cluster-manager/config to check cluster configuration, DATASTORE certificates are still old certificates.
  • The CBM encountered this exception and cluster config in CBM was not updated. 
    /var/log/cbm/cbm.log
    WARN [nsx@xxxx comp="global-manager" level="WARNING" subcomp="cbm"] Task com.vmware.nsx.cbm.tasks.impl.ReplaceCertificatesTask (xxxx-xxxx-xxxx-xxxx) failed on Step UpdateClusterConfiguration with the following exception: java.util.NoSuchElementException: No value present, No value present
    INFO [nsx@xxxx comp="global-manager" level="INFO" subcomp="cbm"] Task com.vmware.nsx.cbm.tasks.impl.ReplaceCertificatesTask (xxxx-xxxx-xxxx-xxxx) failed on Step UpdateClusterConfiguration with the following exception: java.util.NoSuchElementException: No value present
            at java.util.Optional.get(Optional.java:135)
            at com.vmware.nsx.cbm.ClusterConfigurationManager.updateCertificatesInClusterConfig(ClusterConfigurationManager.java:501)
            at com.vmware.nsx.cbm.tasks.impl.ReplaceCertificatesTask$UpdateClusterConfiguration.retryable(ReplaceCertificatesTask.java:118)
            at com.vmware.nsx.cbm.tasks.steps.RetryableStep.executeStep(RetryableStep.java:58)
            at com.vmware.nsx.cbm.tasks.Task.executeTask(Task.java:329)
            at com.vmware.nsx.cbm.tasks.Task.executeTaskWithCheck(Task.java:300)
            at com.vmware.nsx.cbm.tasks.Task.call(Task.java:280)
            at com.vmware.nsx.cbm.tasks.Task.call(Task.java:46)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:750)
  • When new node is trying to join, it could not join DATASTORE cluster service. 'get cluster status' output show as following:
    Group Type: DATASTORE
    Group Status: STABLE

    Members:
        UUID                    FQDN                       IP               IPv6             STATUS
        xxxx-xxxx-xxxx-xxxx     nsxmgr01.xxx.xxx           x.x.x.x1    -                     UP
        xxxx-xxxx-xxxx-xxxx     nsxmgr02.xxx.xxx           x.x.x.x2    -                     UNKNOWN
        xxxx-xxxx-xxxx-xxxx     nsxmgr03.xxx.xxx           x.x.x.x3    -                     UP

 

 

 

 

Environment

VMware NSX 4.1.0.2

Cause

After replacing the corfu certificates, the CBM does not update the cluster config with the new certificates and has old certificates. When new node is trying to join, the CBM on other node sends the old certificates to joining node. That's why new node is not able to join.

Resolution

1. Switchover to make standby GM to Active GM. 
2. Remove standby GM (previous active GM before switchover).
3. Deploy new GM cluster and make that Standby GM.