Failed or non-responsive NSX Controller VMs can safely be destroyed and redeployed via Web Client with no harm to VMware Cloud Foundation
search cancel

Failed or non-responsive NSX Controller VMs can safely be destroyed and redeployed via Web Client with no harm to VMware Cloud Foundation

book

Article ID: 339520

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

Failed NSX Controller VM's can be manually deleted and redeployed quickly without impacting the VMware Cloud Foundation environment.


Symptoms:
  • You see a failed workflow in the SDDC Manager UI and a failed or non responsive NSX Controller VM is the cause.
  • A NSX Controller VM is observed in a disconnected state from the controller cluster or failed state when navigating to Home >> Networking and Security in the web client.   Rebooting the controller does not resolve the issue with the controller. 
  • All other attempts to resolve the NSX Controller VM issue fail.  See  NSX Controller Cluster Failures.


Environment

VMware Cloud Foundation 2.2.x
VMware Cloud Foundation 2.3.x

Resolution

Verify if you can resolve the issue as described in NSX Controller Cluster Failures.

Note:  Do not attempt the resolution while executing a SDDC manager workflows or workload domain expansions, deployments, etc unless workflows are in a failed state and no longer running.

The resolution is a 2 part process:  Delete the controller then Redeploy. 

Delete an NSX Controller

You can delete an NSX Controller forcefully or gracefully. Graceful removal procedure checks for the following conditions before removing the node:

About this task

  • There is no current NSX Controller node upgrade operation.
  • The controller cluster is healthy, and a controller cluster API request can be processed.
  • The host state, as obtained from the vCenter Server inventory, shows connected and powered on.
  • This is not the last controller node.

Forceful removal procedure does not check the above mentioned conditions before removing the controller node.

  • Things to remember while deleting controllers:
    • Do not attempt to delete the controller VM before deleting it through the vSphere Web Client UI or API. When the UI is not usable, use the DELETE /2.0/vdn/controller/{controllerId} API to delete the controller.
    • After deletion of a node, ensure that the existing cluster stays stable.
    • When deleting all the nodes in a cluster, the last remaining node must be deleted using the Forcefully remove the controller option. Always verify that the controller VM is deleted successfully. If not, manually power down the VM and delete the controller VM using the UI.
    • If the delete operation fails, it means that the VM could not get deleted. In such case, invoke controller delete through UI with the Forcefully remove the controller option. For API, set the forceRemoval parameter to true. After forceful removal, manually power down the VM and delete the controller VM using the UI.
    • Since a multi-node cluster can only sustain one failure, deletion counts as a failure. The deleted node must be redeployed before another failure occurs.
  • For Cross-vCenter NSX environment:
    • Deleting the controller VM or powering it off directly in vCenter Server is not a supported operation. The Status column displays Out of sync status.
    • If controller deletion succeeds only partially, and an entry is left behind in the NSX Manager database in a Cross-vCenter NSX environment, use the DELETE api/2.0/vdn/controller/external API.
    • If the controller was imported through the NSX Manager API, use the removeExternalControllerReference API with the forceRemoval option.
    • When deleting a controller, NSX requests to delete a controller VM via vCenter Server using the Managed Object ID (MOID) of the VM. If vCenter Server cannot find VM by its MOID, NSX reports failure for the controller delete request and stops the operation.
      If the Forcefully Delete option is selected, NSX do not stop the controller delete operation and will clear the controller's information. NSX also update all the hosts to no longer trust the deleted controller. However, if the controller VM is still active and running with a different MOID, it still has credentials to participate as a member of the controller cluster. Under this scenario, any logical switch or router that is assigned to this controller node will not function properly because the ESXi hosts no longer trust the deleted controller.

To delete the NSX Controller, perform the following procedure:

Procedure

  1. Log in to the vSphere Web Client.
  2. Click Networking & Security, and then click Installation.
  3. In the NSX Controller nodes section, click the affected controller and take screen shots/print-screens of the NSX Controller Details screen (IP, port group, Cluster/Resource Pool, datastore, Host IP or name) or write down the configuration information for later reference.
  4. Under Management, select the controller that you want to delete.
  5. Click the Delete (x) icon.
  6. Select either Delete or Forcefully Delete.
    • When you select the Forcefully Delete option, the controller gets deleted forcefully and not gracefully. This option ignores any failures and clears the data from database. You should verify that any possible failures are taken care of manually. You must confirm that the controller VM is successfully deleted. If not, you must delete it through vCenter Server.
      Note: If you are deleting the last controller in the cluster, you must select the Forcefully Delete option to remove the last controller node. When there are no controllers in the system, the hosts are operating in what is called "headless" mode. New VMs or vMotioned VMs will have networking issues until new controllers are deployed and the synchronization is completed.
    • If you do not select this , the controller gets deleted gracefully.
  7. Click Yes. Graceful controller deletion uses the following sequence:
    1. Power off the node.
    2. Check the cluster health.
    3. If the cluster is not healthy, power on the controller, and fail the removal request.
    4. If the cluster is healthy, remove the controller VM, and release the IP address of the node.
    5. Remove the controller VM's identity from the cluster.
      The selected controller is deleted.
  8. Re-synchronize the controller state by clicking Actions > Update Controller State.

Redeploy an NSX Controller

  1. Log in to the vSphere Web Client.
  2. From Networking & Security, click Installation > Management.
  3. Deploy a new NSX Controller node by clicking the Add Node (+) icon.
  4. In the Add Controller dialog box, select the datacenter on which you are adding the nodes, and configure the controller settings.
    1. Select the appropriate cluster.
    2. Select a Host in the cluster and storage.
    3. Select the distributed port-group.
    4. Select the IP pool from which IP addresses are to be assigned to the node.
    5. Click OK, wait for installation to complete, and ensure the node have a status of Normal.
  5. Resynchronize the controller state by clicking Actions > Update Controller State.

Update Controller State pushes the current VXLAN and Distributed Logical Router configuration (including Universal Objects in a Cross-vCenter NSX deployment) from NSX Manager to the controller cluster.