Best Practices for upgrading NSX for vSphere
search cancel

Best Practices for upgrading NSX for vSphere

book

Article ID: 314297

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

The purpose of this article is to assist with the upgrade of NSX. It includes general information relating to the upgrade process, and supplemental information based on common issues and the latest support trends.

Note: This article is not a replacement for the NSX upgrade documentation. See NSX Upgrade Documentation for additional information.

Environment

VMware NSX for vSphere 6.1.x
VMware NSX for vSphere 6.3.x
VMware NSX for vSphere 6.2.x

Resolution

Before upgrading NSX for vSphere, it is always important to refer to the VMware Product Interoperability Matrix to ensure the upgrade path planned is supported. It is also important to ensure that NSX will also be certified with third party integration products.

List of NSX components that needs to be completed for a full NSX upgrade:

  1. NSX Manager (including all secondary NSX Managers in a multi-VC setup)
  2. NSX Controllers
  3. ESXi Host VIBs
  4. NSX Edges (Including ESGs and DLRs)
  5. Upgrade Guest Introspection components

Ensure to upgrade NSX components in a maintenance window

  • NSX Manager: Upgrading NSX Manager should not cause any type of data plane outage, but will render the NSX environment unmanageable for a period of time while the upgrade progresses. The NSX Web Client plugin, as well as the Web UI will not be functional during this time, nor will GET or POST API calls function.
  • NSX Controllers: Upgrading the control cluster should not cause disruption to any existing data flows.
  • ESXi Host, NSX VIBs: Though DRS can distribute virtual machines through vMotion, there might be data plane impact as a result of NSX VIB upgrades in stateful or auto-deploy clusters because NSX doesn't block vMotion in the event that the control plane is not completely up. The safest approach is to manually vmotion each VM to each successive host instead of DRS automatically vmotioning VMs.
  • ESGs: Edge Service Gateway upgrades are done by deploying a new VM in parallel to the existing one and then cutting over services. GARP is used to notify the physical switch of the MAC change. During this switchover, it is expected to have a brief data plane disruption while the cutover happens and routing convergence occurs.

Upgrading NSX Manager best practices:

  • From the NSX Manager web UI, ensure that NSX is not registered to vCenter using the root account.
  • From the NSX Manager web UI, ensure that NSX is registered and in sync with the vCenter Server. The last inventory update reported should be recent.
  • Ensure the NSX Manager VM is sized appropriately. For NSX, a minimum of 16GB of memory and 4 vCPUs is required.
  • Validate that NSX Manager has sufficient disk space to proceed with the upgrade.
    • SSH or console to the NSX Manager and run the show filesystem command.
  • Ensure that there are no failed publish tasks being reported in the DFW view of the vSphere Web Client, or hosts that did not receive the firewall configuration successfully.
  • FTP/SFTP is the only supported backup method for NSX Manager. Check to ensure a good backup has been created per the configured schedule and take a manual backup immediately before the upgrade process begins.
  • It is not included in the Upgrade Guide, but another highly recommended step is to take a full cold clone (clone the NSX Manager VM while powered off) of the NSX Manager appliance. This provides extra insurance should there be a problem with the FTP backup.
  • Because NSX is dependent on vCenter for successful operation, it is very important to have a good vCenter Server and vCenter Server database backup before starting the NSX upgrade procedure.
  • Ensure that NSX Manager upgrade bundle has been downloaded and verified against the MD5 check sum.
  • After NSX has been upgraded, it is recommended to change the VXLAN port. The port should be changed to 4789. It is important to note that fresh deployments of NSX for vSphere 6.2.3 no longer use the VXLAN port 8472 and now uses the IANA standard of UDP port 4789 by default.
    • To change the VXLAN port, log in to the vSphere Web Client > Networking & Security > Installation > Logical Network Preparation > Change.

Upgrading Controller best practices:

  • If NSX is being upgraded from versions greater or equal to 6.2.3, validate that the control cluster is in a healthy state. Ensure that from the vSphere Web Client, none of the controllers are in a disconnected state and that each of the controllers agrees on the majority.
    • SSH or console to each controller and run the show control-cluster status command.
      • Join status is Join Complete.
      • Majority status is Connected to cluster majority.
      • Restart status is This controller can be safely restarted.
      • Cluster ID is the same on all three controllers.
  • If NSX is being upgraded from versions before to 6.2.3, VMware recommends that the controllers be deleted and redeployed. Although this is not normally necessary, some changes to the disk partitioning in 6.2.3 and later versions is applied only to newly deployed controllers, and not to upgraded controllers. Because of some known disk space issues impacting the zookeeper service, it is strongly recommended.
    • Remove all three controllers at once, the last controller will need to be removed forcefully.
    • Create all three controllers again, one by one, waiting from a green status to proceed to the next controller.
    • Update controller state in the vSphere Web Client > Networking and Security > Installation > Management > top left Actions GEAR).
    • Run REST API to force sync all controllers.
      • PUT https://<NSXMGR_IP>/api/2.0/vdn/controller/synchronize
      • or with curl "curl -v -H "Content-Type:application/xml" -k -u admin -X PUT https://<NSXMGR_IP>/api/2.0/vdn/controller/synchronize
    • Force Sync Services for all prepared clusters (Networking and Security > Installation > Host Preparation > hover over installation Status for each cluster GEAR).
    • Force sync any DLRs (Networking and Security > NSX Edges > Click on the DLR Edge > Orange Force Sync Button top left).
  • If the NSX controllers fails to upgrade, it is recommended to simply delete and redeploy fresh controllers rather than trying to repair a failed upgrade.

Upgrading ESXi Host VIBs best practices:

  • Ensure that all ESXi hosts and clusters are listed as green in the host preparation tab, and that EAM is not reporting clusters in a Not Ready state.
  • Ensure that EAM service status at the top of Host Prep page is Green.
  • Ensure that EAM status of each agency is Green.
    • To check the status navigate to vSphere Web Client > Home > Administration > vCenter Server Extensions > vSphere ESX Agent Manager (click the one that matches the VC where NSX is connected) > Manage.
  • An alarm will be raised after clicking on Upgrade Available requesting Host be put in Maintenance mode.
  • Ensure that vMotion functions correctly in all clusters to be upgraded.
  • Ensure that all hosts are in a connected state from vCenter Server perspective.
  • VMware recommends ensuring the bypassVumEnabled flag is set to true in the ESX Agent Manager settings to prevent host preparation issues during the upgrade.
    • To make this change on a cluster by cluster basis, see “Agent VIB module not installed” when installing EAM/VXLAN Agent using VUM (2053782).
    • To make this change on an entire VC:
      • Navigate to eam.properties: For vCenter Server (Windows)
        • C:\Program Files\VMware\Infrastructure\tomcat\webapps\eam\WEB-INF\eam.properties
      • For VCVA:
        • /etc/vmware-eam/eam.properties
      • Modify vum.integration setting, change : vum.integration=true to vum.integration=false
  • VMware recommends changing the DRS Cluster setting to Manual or Partially Automated in a VSAN prepared cluster.
  • Upgrade each Cluster one at a time. Wait for all the spinning tasks to complete before starting reboots of each esxi host.
  • If you do not want EAM to automatically balance and reboot each ESXi host in a cluster, set DRS to manual. If DRS is not set to manual, the following operations will cause EAM to automatically put each host in maintenance mode and reboot them one by one: upgrade cluster, all hosts should appear as not ready, resolve all issues on a cluster.
    • This operation cannot be stopped until either DRS is set to manual or the cluster reboots have been completed.
  • If EAM is not being used to automate the install, each host will have to be manually rebooted. The cluster status will remain not ready until each host in the cluster has been rebooted.

Upgrading NSX Edge best practices:

  • Ensure that all NSX edges and DLR control appliances can be reached and that there are no other obvious signs of network communication issues.
  • You can upgrade Logical Routers after NSX Managers, NSX Controller cluster and host clusters are upgraded.
  • You can upgrade an Edge Services Gateway even if you have not yet upgraded the NSX Controller cluster or host clusters.
  • You do not need to upgrade all NSX Edges in the same maintenance window.
  • DLRs should be upgraded one at a time, but normal Edge Services Gateways can be done multiple at a time (not recommended for ECMP ESGs).
  • For ECMP Edges, to minimize the potential impact, it is not recommended to upgrade all ECMP Edges simultaneously. Upgrade them one at a time ensuring that at least one ESG is up and functional at all times.
  • For more information, see the Upgrade NSX Guide section of the NSX Upgrade Guide.

Upgrading Guest Introspection components:

  • If using Guest Introspection for anti-virus or other security services, ensure that you have contacted your security vendor (For example: McAfee, Symantec, Trend Micro etc) to determine the supported upgrade path and any other required dependencies. It is very important that the correct, compatible version be used.
  • Ensure that there are no VMs in a cluster that are orphaned or inaccessible in a cluster which is protected by Guest Introspection.
  • The best practice to upgrade each Guest Introspection deployment is to simply remove the service deployment, and recreate the deployment. This operation is non-disruptive. It is also often faster to re-create each deployment and then to try the upgrade operation. For third party service deployments, ensure to read their documentation, as some services should not be upgraded this way.