This article provides procedures for replacing a single node from a backup in a 3-node NSX-T Manager cluster within a VCF environment.
Symptoms:
The NSX cluster's health will be in degraded state when there is a faulty NSX manager node and can block several VCF operations.
In this procedure, the 3-node NSX-T manager cluster has a single node down. In this example, there are 3x NSX-T manager nodes in the MGMT cluster:
If you do not know the admin and root passwords, follow the instructions in the password chapter of the VMware Cloud Foundation Operations and Administration guide to retrieve them from the SDDC Manager inventory. If you want to change these two passwords, should do so after restoring the NSX-T manager VMs and using the SDDC Manager password update function.
Note: Make sure you download the OVA image before you start and check the md5sum of the OVA file as well. Section 2.2.4 lists the procedures for determining the specific OVA to download.
Note: Steps differ based on if the node was automatically deployed via NSX UI or manually deployed to vCenter via OVA for deleting the manager
First you must obtain the UUID of the faulty NSX-T manager, SSH into one operational NSX-T Manager
ssh admin@xxxxxx3
Issue the get cluster status command, record the UUID of the faulty NSX-T manager
get cluster status
Issue the detach command from the operational NSX-T Manager:
detach node <uuid>
For example: detach node 77a01dab-xxxx-xxxx-xxxx-0bfc3e0c40d9
2.2.2. Power off faulty NSX-T manager VM
2.2.3. Delete faulty NSX-T manager VM
After removing one node from the cluster and before adding a new one, make sure you run "get cluster status" command and verify the services are UP on the nodes.
The new node cannot be added from the NSX-T manager UI. The NSX-T manager VMs are in the MGMT cluster, but the NSX-T manager cluster only knows about the VI WLD vCenter, not the MGMT domain vCenter, so the add nodes wizard will not allow adding a node to the MGMT cluster.
For example: 2.4.2.1.0.14374085
nsx-unified-appliance-2.4.2.1.0.14374085.ova, click Next
For example:
fqdn=xxxxxx2.com
role=nsx-manager-nsx-controller
gateway=172.17.110.1
ipv4=172.17.110.24
netmask=255.255.255.0
dns=172.17.110.251
domain=xxxxxx.xxx
ntp=172.17.110.251
ssh=enabled (checked)
allowroot=disabled (unchecked)
nslookup xxxxxx1
ping xxxxxx1
and it will display the IP addressssh admin@xxxxxx1
get cluster config | find Id:
get certificate api thumbprint
ssh admin@xxxxxx2
join <nsx mgr ip> cluster-id <uuid> thumbprint <thumbprint> username admin
For example:
join 172.17.110.23 cluster-id 3ca96913-xxxx-xxxx-xxxx-365a7c52b545 thumbprint dd35[...]ca1e username admin
For example:
Issued By=CA
Issued To=xxxxxx2.com
curl -H 'Accept: application/json' -H 'Content-Type: application/json' --insecure -u 'admin:<password>' -X POST
'https://<new nsx-t mgr fqdn or ip>/api/v1/node/services/http?action=apply_certificate&certificate_id=<certificate id>'
For example:
curl -H 'Accept: application/json' -H 'Content-Type: application/json' --insecure -u 'admin:<password>' -X POST
'https://xxxxxx2.com/api/v1/node/services\/http?action=apply_certificate&certificate_id=24781ed5-xxxx-xxxx-xxxx-cc8a4415d60e'
Specific to VCF 4.0: If assigning the certificate fails because the certificate revocation list (CRL) could not be verified, please follow the steps in Failure to apply NSX-T certificate: Couldn't get LDAP context from URI to address the problem. If you decide to disable CRL checking in order to assign the certificate, re-enable CRL checking once the certificate has been assigned.
2.2.7.2 If the certificate does not exits, please follow Replace Expired or Self-signed NSX-T Manager Certificates with VMCA-Signed Certificates for more information.
For example:
echo | openssl s_client -no_ign_eof -showcerts -connect \
xxxxxx2.com:443 > nsx2.pem
For example:
openssl x509 -in nsx2.pem -noout -text | more
This step is specific to VCF 4.0, as a final step, you need to update the SSH keys SDDC Manager saves for the NSX-T managers. VMware offers a script that automates this process. Please follow the Refresh SDDC Manager SSH Keys procedure documented in VMware Cloud Foundation SDDC Manager Recovery Scripts.
When Cloud Foundation deploys NSX-T Manager, it creates a VM anti-affinity rule to prevent the VMs of the same NSX-T Manager cluster from running on the same host. In this step, you need to add the newly deployed replacement VM to the rule for this NSX-T Manager cluster.
Log in to the management domain vCenter Server, and select Menu > Hosts and Clusters.
In the Navigator pane, select the management cluster
Select Configure > VM/Host Rules.
Finally, add the VM to the correct "separate virtual machine" rule. The rule for the management-domain NSX-T Manager cluster is named anti-affinity-rule-nsxt, while the rule for workload domains has the form "<NSXT Mgr VIP FQDN> - NSX-T Managers Anti Affinity Rule". Once you locate the rule, click edit, and add the newly deployed VM (e.g., vi1nsxmanager2) to it.