This article provides procedures for replacing a single node from a backup in a 3-node NSX-T Manager cluster within a VCF environment.
When the NSX cluster's health is degraded, and there is a faulty NSX manager node, it can block several VCF operations.
VMware NSX-T Data Center
VMware Cloud Foundation 4.x
VMware Cloud Foundation 5.x
In this procedure, the 3-node NSX-T manager cluster has a single node down. In this example, there are 3x NSX-T manager nodes in the MGMT cluster:
######2
has gone down and must be replaced. You must obtain the following information for the node that needs a replacement.If you do not know the admin and root passwords, follow the instructions in the password chapter of the VMware Cloud Foundation Operations and Administration guide to retrieve them from the SDDC Manager inventory. If you want to change these two passwords, should do so after restoring the NSX-T manager VMs and using the SDDC Manager password update function.
Note: Make sure you download the OVA image before you start and check the md5sum of the OVA file as well. Section 2.2.4 lists the procedures for determining the specific OVA to download.
Note: Steps differ based on if the node was automatically deployed via NSX UI or manually deployed to vCenter via OVA for deleting the manager
First you must obtain the UUID of the faulty NSX-T manager, SSH into one operational NSX-T Manager
ssh admin@######3
Issue the get cluster status command, record the UUID of the faulty NSX-T manager
get cluster status
Issue the detach command from the operational NSX-T Manager:
detach node <uuid>
For example: detach node 77a01dab-####-####-####-0bfc3e0c40d9
2.2.2. Power off faulty NSX-T manager VM
2.2.3. Delete faulty NSX-T manager VM
After removing one node from the cluster and before adding a new one, make sure you run "get cluster status" command and verify the services are UP on the nodes.
The new node cannot be added from the NSX-T manager UI. The NSX-T manager VMs are in the MGMT cluster, but the NSX-T manager cluster only knows about the VI WLD vCenter, not the MGMT domain vCenter, so the add nodes wizard will not allow adding a node to the MGMT cluster.
For example: 2.4.2.1.0.14374085
nsx-unified-appliance-2.4.2.1.0.14374085.ova, click Next
For example:
fqdn=######2.com
role=nsx-manager-nsx-controller
gateway=172.##.110.#
ipv4=###.17.110.##
netmask=255.255.255.0
dns=172.##.110.###
domain=######.com
ntp=172.##.###.251
ssh=enabled (checked)
allowroot=disabled (unchecked)
nslookup ######1
ping ######1
and it will display the IP addressssh admin@######1
get cluster config | find Id:
get certificate api thumbprint
ssh admin@######2
join <nsx mgr ip> cluster-id <uuid> thumbprint <thumbprint> username admin
For example:
join ###.17.110.## cluster-id 3ca96913-####-####-####-365a7c52b545 thumbprint dd35[...]ca1e username admin
For example:
Issued By=CA
Issued To=######2.com
curl -H 'Accept: application/json' -H 'Content-Type: application/json' --insecure -u 'admin:<password>' -X POST
'https://<new nsx-t mgr fqdn or ip>/api/v1/node/services/http?action=apply_certificate&certificate_id=<certificate id>'
For example:
curl -H 'Accept: application/json' -H 'Content-Type: application/json' --insecure -u 'admin:<password>' -X POST
'https://######2.com/api/v1/node/services\/http?action=apply_certificate&certificate_id=24781ed5-####-####-####-cc8a4415d60e'
Specific to VCF 4.0: If assigning the certificate fails because the certificate revocation list (CRL) could not be verified, please follow the steps in Failure to apply NSX-T certificate: Couldn't get LDAP context from URI to address the problem. If you decide to disable CRL checking in order to assign the certificate, re-enable CRL checking once the certificate has been assigned.
2.2.7.2 If the certificate does not exits, please follow Replace Expired or Self-signed NSX-T Manager Certificates with VMCA-Signed Certificates for more information.
For example:
echo | openssl s_client -no_ign_eof -showcerts -connect \
######2.com:443 > nsx2.pem
For example:
openssl x509 -in nsx2.pem -noout -text | more
This step is specific to VCF 4.x and VCF 5.x, as a final step, you need to update the SSH keys SDDC Manager saves for the NSX-T managers. VMware offers a script that automates this process. Please follow the Refresh SDDC Manager SSH Keys procedure documented in How to update the SSH host keys on the SDDC Manager.
When Cloud Foundation deploys NSX-T Manager, it creates a VM anti-affinity rule to prevent the VMs of the same NSX-T Manager cluster from running on the same host. In this step, you need to add the newly deployed replacement VM to the rule for this NSX-T Manager cluster.
Log in to the management domain vCenter Server, and select Menu > Hosts and Clusters.
In the Navigator pane, select the management cluster
Select Configure > VM/Host Rules.
Finally, add the VM to the correct "separate virtual machine" rule. The rule for the management-domain NSX-T Manager cluster is named anti-affinity-rule-nsxt, while the rule for workload domains has the form "<NSXT Mgr VIP FQDN> - NSX-T Managers Anti Affinity Rule". Once you locate the rule, click edit, and add the newly deployed VM (e.g., vi1nsxmanager2) to it.