After replace CORFU server certificates, new deployed nsx manager could not join manager cluster

search cancel

After replace CORFU server certificates, new deployed nsx manager could not join manager cluster

book

Article ID: 411292

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

CORFU server certificates have been replaced due to the following issue - NSX alarms indicating certificates have expired or are expiring
From NSX UI, Corfu server certificates are shown as replaced.
Use API GET /api/v1/cluster-manager/config to check cluster configuration, DATASTORE certificates are still old certificates.
The CBM encountered this exception and cluster config in CBM was not updated.
/var/log/cbm/cbm.log
WARN [nsx@xxxx comp="global-manager" level="WARNING" subcomp="cbm"] Task com.vmware.nsx.cbm.tasks.impl.ReplaceCertificatesTask (xxxx-xxxx-xxxx-xxxx) failed on Step UpdateClusterConfiguration with the following exception: java.util.NoSuchElementException: No value present, No value present
INFO [nsx@xxxx comp="global-manager" level="INFO" subcomp="cbm"] Task com.vmware.nsx.cbm.tasks.impl.ReplaceCertificatesTask (xxxx-xxxx-xxxx-xxxx) failed on Step UpdateClusterConfiguration with the following exception: java.util.NoSuchElementException: No value present
at java.util.Optional.get(Optional.java:135)
at com.vmware.nsx.cbm.ClusterConfigurationManager.updateCertificatesInClusterConfig(ClusterConfigurationManager.java:501)
at com.vmware.nsx.cbm.tasks.impl.ReplaceCertificatesTask$UpdateClusterConfiguration.retryable(ReplaceCertificatesTask.java:118)
at com.vmware.nsx.cbm.tasks.steps.RetryableStep.executeStep(RetryableStep.java:58)
at com.vmware.nsx.cbm.tasks.Task.executeTask(Task.java:329)
at com.vmware.nsx.cbm.tasks.Task.executeTaskWithCheck(Task.java:300)
at com.vmware.nsx.cbm.tasks.Task.call(Task.java:280)
at com.vmware.nsx.cbm.tasks.Task.call(Task.java:46)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
When new node is trying to join, it could not join DATASTORE cluster service. 'get cluster status' output show as following:
Group Type: DATASTORE
Group Status: STABLE

Members:
UUID FQDN IP IPv6 STATUS
xxxx-xxxx-xxxx-xxxx nsxmgr01.xxx.xxx x.x.x.x1 - UP
xxxx-xxxx-xxxx-xxxx nsxmgr02.xxx.xxx x.x.x.x2 - UNKNOWN
xxxx-xxxx-xxxx-xxxx nsxmgr03.xxx.xxx x.x.x.x3 - UP

Environment

VMware NSX 4.1.0.2

Cause

After replacing the corfu certificates, the CBM does not update the cluster config with the new certificates and has old certificates. When new node is trying to join, the CBM on other node sends the old certificates to joining node. That's why new node is not able to join.

Resolution

1. Switchover to make standby GM to Active GM.
2. Remove standby GM (previous active GM before switchover).
3. Deploy new GM cluster and make that Standby GM.

Feedback

thumb_up Yes

thumb_down No