NSX Manager upgrade causes a change in portgroup for one or more interfaces of the Edge appliances
search cancel

NSX Manager upgrade causes a change in portgroup for one or more interfaces of the Edge appliances

book

Article ID: 390799

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • During the upgrade of NSX Manager Cluster from 3.1.x to 4.x, Edge VM adapters have the virtual NIC portgroup changed. Depending on the portgroup that got changed, this may cause a data plane outage.
  • Following log lines are seen in /var/log/nsxapi.log matching the timestamp of upgrade activity:

INFO providerTaskExecutor-18 EdgeVMFabricUtils 15109 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] [entId=EdgeTransportNode//infra/sites/default/enforcement-points/default/edge-transport-node/########-####-####-####-########f0d0] Configure network interfaces: {0=dvportgroup-33, 1=network-72, 2=dvportgroup-34, 3=dvportgroup-35, 4=network-72} during OVF deploy

ERROR ActivityWorkerPool-1-2 PolicyPathUtil 15109 POLICY [nsx@6876 comp="nsx-manager" errorCode="PM500012" level="ERROR" subcomp="manager"] Invalid path passed for network-72

INFO ActivityWorkerPool-1-2 EdgeTransportNodeFabricUtils 15109 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] VM vm-xxx, computer manager id ########-####-####-####-########82de. Reconfigure edited network interfaces: {1=network-72} Vifs {}

Note: For this case, network-72 is NOT the desired portgroup for network interface 1 of the Edge appliance.

  • Reviewing the network settings of Edge VM in vCenter Server shows a different portgroup configured for this interface.

Environment

NSX 3.1.x, 3.2.1

Cause

In 3.1.x releases, depending on which database table has the latest timestamp, the EdgeTransportNode object is populated from one of the two tables in NSX corfu database:  EdgeNodePlacementConfig (vSphere config of Edge VM) or the EdgeNode (Edge config in NSX Manager).

While upgrading the environment to 4.x, due to an upgrade in NSX corfu database, EdgeTransportNode object is always populated using the configuration in EdgeNode table of NSX Manager and if this table is referencing stale data, it will get realized once NSX managers come up in the new 4.x version and may cause the Edge adapter portgroup to change.

Resolution

This issue is resolved in NSX versions 3.2.2 and newer ie. if the environment is being upgraded FROM 3.2.2 and newer.

 

Workaround

  • Collect the output of the following corfu tables:

/opt/vmware/bin/corfu_tool_runner.py -r nsx-manager -t EdgeNode --tool corfu-browser > EdgeNode.db
/opt/vmware/bin/corfu_tool_runner.py -r nsx-manager -t EdgeNodePlacementConfig --tool corfu-browser > EdgeNodePlacementConfig.db

  • Once we have the output of the two tables, look for difference in "dataNetworksIds" attribute for all edge nodes in the environment
  • If a difference in values is observed, run following API:

GET https://<NsxMgrIp>/api/v1/transport-nodes/<tnId>

  • With that output of the above REST API, do an update on the same Edge TransportNode ID (changing the mismatched "dataNetworkIds" attribute, if needed):

PUT https://<NsxMgrIp>/api/v1/transport-nodes/<tnId>

  • The above update will make EdgeNode and EdgeNodePlacementConfig to have the same data and network settings of Edge VM in vSphere will be set as the configuration in NSX Manager, eliminating the mismatch.

 

NOTE: The workaround is given for any NSX environments which are yet to be upgraded. For an already upgraded environment hitting this issue, please set the desired portgroup or logical switch manually in the NSX Manager UI.