NSX Host or Edge Transport Nodes have a status of "Unknown" and Tunnels are "Not Available" or "Validation Errors"
search cancel

NSX Host or Edge Transport Nodes have a status of "Unknown" and Tunnels are "Not Available" or "Validation Errors"

book

Article ID: 324194

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • NSX may have been recently upgraded from 3.x to 4.x.
  • NSX Transport Nodes (Host or Edge) status is 'Unknown' or NSX configuration shows 'Validation Errors' in the NSX UI or NSX API 

    To get a list of all Transport Nodes and their IDs:

    GET https://<NSX_MANAGER>/api/v1/transport-nodes/

    To get the status of a specific Transport Node:

    GET https://<NSX_MANAGER>/api/v1/transport-nodes/<tn-id>/status
  • Clicking on Transport Node 'Validation Errors' in the NSX UI shows errors similar to the below:

    600: The requested object: <TransportZoneProfile-UUID> could not be found. Object identifiers are case sensitive."
  • NSX Transport Nodes tunnel status may show 'Not Available' in the NSX UI.




  • No data plane impact has been reported.
  • NSX Manager log /var/log/proton/nsxapi.log has a log entry similar to the below:

    "The requested object : TransportZoneProfile/<TransportZoneProfile-UUID> could not be found."
  • The above Transport Zone Profile UUID is not present when queried via API:

    GET https://<NSX_MANAGER>/api/v1/transportzone-profiles?include_system_owned=true
  • The above Transport Zone Profile UUID cannot be found in NSX or VC global search.
  • The NSX manager log /var/log/search/elasticsearch_index_indexing_slowlog.log has a log similar to example:

    [{"resource_type":"BfdHealthMonitoringProfile","profile_id":"UUID_in_Validation_Error"}]}],"vmk_install_migration":[],"pnics_uninstall_migration":[],"vmk_uninstall_migration":[],"not_ready":false}],"resource_type":"StandardHost]

Environment

  • VMware NSX-T Data Center
  • VMware NSX

Cause

The status of the Transport Nodes (TNs) are unknown because the Transport-Zone Profile (TZP) which is being referenced does not exist. The reasons why the referenced Transport-Zone Profile (TZP) is not present in the system can be due to multiple reasons:

  • In previous versions of NSX, it was possible in policy mode to update or delete a Transport-Zone Profile (TZP), even if it was in use by a Transport Nodes(TN) or Transport-Node Profile(TNP).
  • In some cases, Aria Operations for Networks creates a Transport-Zone Profile (TZP) and when Aria Operations for Networks is removed, Transport-Zone Profile(TZP) are deleted.  Validation has been added in newer versions to avoid this issue.

Resolution

To resolve this issue for Edge nodes, open a support case with Broadcom support.
For more information, see Creating and managing Broadcom support cases.

To resolve this issue for ESXi hosts, follow these steps:

Note: It has been found in one case that removing the Transport Node (Host) from the cluster, waiting for the NSX VIBs to uninstall, then readding the Transport Node to the cluster, resolved the unknown state.

  1. Take a new SFTP/FTP based backup and ensure the backup passphrase is known before proceeding.
  2. Copy the attached logical-migration.jar file to one of the NSX Manager nodes and place it in the directory /opt/vmware/upgrade-coordinator-tomcat/temp/.
  3. Stop the proton service on all three Manager nodes from the root shell# service proton stop

    Note:  Do not use the "/etc/init.d/proton stop" command, as proton service would start after a few seconds without user intervention)

  4. On the NSX Manager node where the jar file was copied, run the following command. This command is a single line command with no line breaks:

    # java -Dcorfu-property-file-path=/opt/vmware/upgrade-coordinator-tomcat/conf/ufo-factory.properties -Djava.io.tmpdir=/opt/vmware/upgrade-coordinator-tomcat/temp -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j.configurationFile=/opt/vmware/upgrade-coordinator-tomcat/conf/log4j2.xml -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/opt/vmware/upgrade-coordinator-tomcat/conf/logging.properties -Dnsx-service-type=nsx-manager -DTransportZoneProfileRectifierInTNAndTNP.userName=admin -DTransportZoneProfileRectifierInTNAndTNP.password='ENTER_ADMIN_PASSWORD_HERE' -DTransportZoneProfileRectifierInTNAndTNP.updateTn=true -DTransportZoneProfileRectifierInTNAndTNP.updateTzp=true -cp /opt/vmware/upgrade-coordinator-tomcat/temp/logical-migration.jar com.vmware.nsx.management.migration.impl.TransportZoneProfileRectifierInTNAndTNP

      Notes:
    1. Make sure to change the admin password in the highlighted section.
    2. If you download the file multiple times, make sure that the file that is provided is as logical-migration.jar, if this is as logical-migration(1).jar it will not run.
    3. If your NSX Manager is using OpenSDK (Java) version 17, the script may fail with an exception "Unsatisfied dependency expressed through method". Please, review the KB article "logical-migration.jar script fails with an exception "org.springframework.beans.factory.UnsatisfiedDependencyException"" for the next steps.
                                                                           
  5. Set file ownership: (This step is after issuing the script command in step 5 as the script will be creating new files that will need permissions to write to the upgrade-coordinator.log in order to report "Migration Task Finished")

    # chown uuc:uuc /var/log/upgrade-coordinator/upgrade-coordinator*log* 

  6. The procedure is complete once the "Migration task finished" is printed in the /var/log/upgrade-coordinator/upgrade-coordinator.log file. You can check for this via the following command:

    # grep "Migration task finished" /var/log/upgrade-coordinator/upgrade-coordinator.log

  7. Start proton on all three Manager nodes:

    # service proton start

  8. Switch to the admin user on each manager node via the su admin command.
  9. Execute the following from NSXCLI on all NSX Manager nodes so that CorfuDB and Search Indexes are in sync:

    > start search resync policy
    > start search resync manager

    Note:  If above command does not resolve the UI issue, execute the following command:

    > start search resync all

    Note: It may be necessary to also run the following two commands:

    > start search resync telemetry
    > start search resync inventory

  10. Login to the NSX UI and validate that the host status is resolved.

Important:

  • In some cases, it maybe necessary to detach and re-attach the TNP on impacted cluster to fully resolve the issue. In lieu of doing this, hosts can be put into maintenance mode and moved out of the cluster and back in one at a time.
  • For Validation Errors, you may need to select the host, click on Configure NSX > Next > Finish to change the status of the host to Up


In cases where Service VMs are deployed in the cluster the affected host transport nodes are a part of, detaching TNP gives the error:

"Error: Cluster ########-####-####-####-########bed1:domain-c10 has NSX managed service VM deployed or deployment is in progress. Delete these deployment, before deleting TN. (Error code: 26173)".

In such a scenario, the alternative to detaching / re-attaching TNP would be to follow the below steps:

  1. Place the host into maintenance mode.
  2. Run the API "GET https://{{nsx-ip}}/api/v1/transport-nodes/<tn-id>" to get the payload.
  3. Using the same payload, run the API "PUT https://{{nsx-ip}}/api/v1/transport-nodes/<tn-id>".
  4. Wait for a couple of minutes and check the configuration state of the host.
  5. Check the status of the tunnels and in general the status of the host with respect to the manager.
  6. Repeat the steps for all the affected hosts.

Attachments

logical-migration.jar get_app