NSX Host or Edge Transport Nodes have a status of "Unknown" and Tunnels are "Not Available" or "Validation Errors"
search cancel

NSX Host or Edge Transport Nodes have a status of "Unknown" and Tunnels are "Not Available" or "Validation Errors"

book

Article ID: 324194

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • The status of Transport Nodes, either Edge or Host are in Unknown state on the UI and when queried via the API:
To get a list all Transport Nodes and their IDs
GET https://<NSX_MANAGER>/api/v1/transport-nodes/

To get the status of a specific Transport Node
GET https://<NSX_MANAGER>/api/v1/transport-nodes/<tn-id>/status
  • Status of Tunnels are Not Available in the UI.
  • There is no reported data plane impact.
  • The NSX manager log /var/log/proton/nsxapi.log has a log entry similar to this example:
"The requested object : TransportZoneProfile/<TransportZoneProfile-ID> could not be found.
  • Querying the TZP's that are present in the system, this UUID printed in the log above is not found:
    GET https://<NSX_MANAGER>/api/v1/transportzone-profiles?include_system_owned=true
  • Status of TN in Cluster of NSX UI is Validation Errors
  • Clicking on error can read "600: The requested object: ####-##-## could not be found. Object identifiers are case sensitive."
  • UUID cannot be found in NSX or VC global search
  • The NSX manager log /var/log/search/elasticsearch_index_indexing_slowlog.log has a log similar to example:

    [{"resource_type":"BfdHealthMonitoringProfile","profile_id":"UUID_in_Validation_Error"}]}],"vmk_install_migration":[],"pnics_uninstall_migration":[],"vmk_uninstall_migration":[],"not_ready":false}],"resource_type":"StandardHost]

Environment

VMware NSX-T Data Center
VMware NSX

Cause

The status of the TN's are unknown because the nodes have references to a TZP that does not exist.
In previous versions of NSX-T, it was possible in policy mode to update or delete a TZP, even if it was in use by a TN or Transport Node Profile (TNP).

Resolution

To resolve this issue for Edge nodes, please open a support case with Broadcom support.

To resolve this issue for ESXi hosts please follow these steps:

1. Take a new FTP based backup and ensure the backup passphrase is known before proceeding.
 
2. Copy the attached logical-migration.jar file to one of the Managers and place it in the directory /opt/vmware/upgrade-coordinator-tomcat/temp/.
 
3. Stop proton on all three Manager nodes from the root shell:
# service proton stop
 
4. On the NSX Manager where the jar file was copied, run the following command. This command is a single line command with no line breaks. Note you must populate the admin password of the NSX Manager below
 
# java -Dcorfu-property-file-path=/opt/vmware/upgrade-coordinator-tomcat/conf/ufo-factory.properties -Djava.io.tmpdir=/opt/vmware/upgrade-coordinator-tomcat/temp -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j.configurationFile=/opt/vmware/upgrade-coordinator-tomcat/conf/log4j2.xml -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/opt/vmware/upgrade-coordinator-tomcat/conf/logging.properties -Dnsx-service-type=nsx-manager -DTransportZoneProfileRectifierInTNAndTNP.userName=admin -DTransportZoneProfileRectifierInTNAndTNP.password='ENTER_ADMIN_PASSWORD_HERE' -DTransportZoneProfileRectifierInTNAndTNP.updateTn=true -DTransportZoneProfileRectifierInTNAndTNP.updateTzp=true -cp /opt/vmware/upgrade-coordinator-tomcat/temp/logical-migration.jar com.vmware.nsx.management.migration.impl.TransportZoneProfileRectifierInTNAndTNP

5. Set file ownership
# chown uuc:uuc /var/log/upgrade-coordinator/upgrade-coordinator*log* 

6. The procedure is complete once the following text is printed in the upgrade-coordinator log file:
# grep "Migration task finished" /var/log/upgrade-coordinator/upgrade-coordinator.log
 
7. Start proton on all three Manager nodes:
# service proton start
 
8. Execute the following from NSXCLI on all the NSX manager nodes so that corfudb and Search indexes are in sync:
 # start search resync policy
 # start search resync manager


9. Login to the NSX UI and validate that the host status is resolved.
    In some cases it maybe necessary to detach and reattach the TNP on impacted cluster to fully resolve the issue.

 
 

Attachments

logical-migration.jar get_app