NSX-T Federation RTEP Tunnels showing degraded
search cancel

NSX-T Federation RTEP Tunnels showing degraded

book

Article ID: 369814

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

After upgrading NSX-T Federation environment from version 3.1.x to 3.2.x, Remote Tunnel Endpoint (RTEP) status shows Degraded or Down in the NSX-T Manager UI. This issue is cosmetic and does NOT affect cross-site connectivity in NSX Federation deployments.

Error messages similar to the following may appear in the NSX-T Manager UI:

  • "Remote Tunnel Endpoint status: Degraded"
  • "RTEP Tunnel status: Down"

In the NSX-T Manager UI, when viewing the affected edge node's tunnels, they will appear with a red "Active" status indicator despite being functional, as shown in the Tunnels tab. This discrepancy between the active status and the red indicator reflects the degraded state of the RTEP tunnels.

The issue can be verified by checking the NSX Manager log located at /var/log/proton/nsxapi.log, where the following message may be observed:

YYYY-MM-DDTHH:MM:SS.201Z  INFO l3-tasks1 EdgeClusterMeshUpdateTask 20324 ROUTING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Continue with full underlay mesh span RemoteMeshSpanForEc [isFullSpan=true, remoteEClusterIds=[########-####-####-####-####], getRevision()=49545, getIdentifier()=RemoteMeshSpanForEc/########-####-####-####-####]] for edge cluster ########-####-####-####-####]

Environment

  • VMware NSX-T Data Center 3.1.x upgraded to 3.2.x
  • NSX Federation deployment

Cause

This is a known issue that occurs when upgrading from NSX 3.1.x to 3.2.x in a Federated environment. The root cause is that one Site T1 gateway does not have the information about the remote site edge cluster ID after the upgrade.

This occurs because the configuration revision counter needs to be updated to trigger proper synchronization between the NSX Federation Global Manager and the Local Managers.

Resolution

To resolve this issue, follow these steps to update all affected Tier-1 gateways:

  1. Identify affected Tier-1 gateways by reviewing alarms across all sites:
    • Review all alarms in the NSX-T Manager UI across all federation sites experiencing this issue
    • Look for alarms containing information similar to:
       
      Description: RTEP (Remote Tunnel Endpoint) BGP session from source IP ###.###.###.### to remote location example-location neighbor IP ###.###.###.### is down. View Runtime Details
    • Make note of all Tier-1 gateway names associated with these alarms
    • Ensure you have identified all Tier-1 gateways across all sites where this issue is occurring
  2. Retrieve all Tier-1 gateways in your environment:
    • Use the following API to retrieve all Tier-1 gateways from the Global Manager:
       
      GET https://<NSX_GLOBAL_MANAGER>/global-manager/api/v1/global-infra/tier-1s/
    • Note that the display names shown in the UI may differ from the actual Tier-1 IDs required for API calls. Search through the results of this GET using the display names acquired from Step 1 to find the correct IDs.
  3. For each affected Tier-1 gateway:
    • Retrieve the locale-services for that Tier-1 gateway:
       
      GET https://<NSX_GLOBAL_MANAGER>/global-manager/api/v1/global-infra/tier-1s/<TIER1_ID>/locale-services
    • Take note of both the locale-service ID and display name for each returned locale-service
  4. Update each locale-service by issuing a PATCH command to update the "display_name" field:
    • Run the following command on the Active Global Manager for each locale-service:
       
      PATCH https://<NSX_GLOBAL_MANAGER>/global-manager/api/v1/global-infra/tier-1s/<TIER1_ID>/locale-services/<LOCALE_SERVICE_ID>
    • If using curl, include the -d flag as shown below:
       
      curl -k -H "Content-Type:application/json" -u admin -X PATCH https://<NSX_GLOBAL_MANAGER>/global-manager/api/v1/global-infra/tier-1s/EXAMPLE-T1-01-ID/locale-services/example-locale-service-01 -d '{"display_name" : "example-locale-service-01"}'
    • If using Postman, add the JSON payload in the request body:
       
      {"display_name" : "example-locale-service-01"}
    • This updates the revision counter and triggers proper configuration synchronization
    • You will see revision count increment for the locale service
    • You can see this revision count increment by running the get in step 3 again
    • You should see a 200 success when running this patch
  5. Verify the RTEP status has returned to normal:
    • Check the RTEP tunnel status in the NSX-T Manager UI
    • Monitor the NSX Manager logs for any continuing errors
    • Check to make sure the associated alarms have been resolved

Additional Information

If the issue persists after following these steps, contact Broadcom Support for further assistance.

Please provide the following information when opening a support request with Broadcom for this issue:

  • NSX-T version details (source and target versions)
  • Screenshots of the RTEP tunnel status from the UI and the alarms
  • Output from the GET requests for tier-1s and locale-services
  • NSX-T Manager logs from /var/log/proton/nsxapi.log

Refer to the following resources for creating a Broadcom case and uploading files to that case