Transport Tunnels for HCX IX/NE appliances shows degraded
search cancel

Transport Tunnels for HCX IX/NE appliances shows degraded

book

Article ID: 396624

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • In the HCX UI > Interconnect >  View Appliances ( inter affected Service Mesh), you will see the Tunnel Status as degraded with the failing status:
    Overall transport tunnel status is degraded.
    Overall encryption tunnel status is up.
    Service pipeline status is up.
  • In the affected Fleet appliance, you may see which IPsec tunnel is degraded in /var/log/system_events:
    {"id":20000,"level":6,"timestamp":#######,"UTC":"<timestamp>","message":"IPSec Tunnel is down","metadata":{"tunnelId":"t_2","tunnelType":"IPSec"}}
    {"id":20000,"level":6,"timestamp":#######,"UTC":"<timestamp>","message":"IPSec Tunnel is down","metadata":{"tunnelId":"t_1","tunnelType":"IPSec"}}
    {"id":20000,"level":6,"timestamp":#######,"UTC":"<timestamp>","message":"IPSec Tunnel is down","metadata":{"tunnelId":"t_0","tunnelType":"IPSec"}}

Environment

VMware HCX

Cause

  • When Application Path Resiliency (APR) is enabled in HCX, multiple tunnels are established between the source and target NSX-T Edge (IX/NE) appliances to enhance communication resilience. If one or more tunnels become unavailable, the overall APR connection status will be marked as "degraded" in HCX, indicating a potential issue.
  • Each tunnel uses a unique source UDP port to communicate with the target UDP port 4500. These source UDP ports are assigned within a predefined range (4500-4628).
  • As per below reference, te_0 to te_7 tunnels were established with te_3 being down. 
  • If firewall rules or network configurations on the underlay network block these specific UDP ports (4500-4628), tunnel communication will fail, impacting HCX performance.


Resolution

  • When APR is selected, eight tunnels are enabled and available for (HCX-IX) and network extension (HCX-NE) appliance in the service mesh, which uses internal ip over outer uplink IP of IX/NE appliances, and each tunnel traffic is differentiated using source UDP port in the range of 4500-4628.

    • Source & Destination - uplink IPs of appliances
    • Protocol - UDP
    • Ports - 4500-4628

      Note:
      Overall Transport Tunnel status being in degraded status doesn't immediately break the connection, but it degrades performance and signals a potential issue.

Additional Information

  • Validating IPSEC status across the NE/IX appliance, the tunnels randomly go down, hence the overall tunnel status is expected to show as “degraded”.
  • Foutrace events can be validated from the respective appliances' message logs.
    • Path: After SSH’d to the respective appliance through HCX manager cli, /var/log/messages.log
    • Example for reference:

 

<date> <SERVICE-MESH-NE-R1> cgw 1100 - - [Info-policyEngine] : foutrace run 0: (/opt/vmware/bin/foutrace -i ipip_te_3 -s 192.0.#.## -d 192.0.#.## -S 10.##.#.## -D 10.#.###.#):
Traceroute from 192.0.#.## to 192.0.#.##, using interface ipip_te_1.  Time: <date> UTC m=+0.002350000

 1  *
 2  *
 3  *
 4  *
 5  *
 6  *
 7  ##.##.##.##        32.564ms        32.589ms        32.656ms        32.661ms        34.618ms        34.621ms       37.531ms        37.58ms
 8  *
 9  *
10  *
11  *
...
39  *
40  *

 

    • Connector end:

 

<date> <SERVICE-MESH-NE-R1> cgw 1100 - - [Info-policyEngine] : foutrace run 0: (/opt/vmware/bin/foutrace -i ipip_te_3 -s 192.0.#.## -d 192.0.#.## -S 10.##.#.## -D 10.#.###.#):
Traceroute from 192.0.#.## to 192.0.#.##, using interface ipip_te_1.  Time: <date> UTC m=+0.002350000

 1  10.##.#.##         3.614ms 3.634ms
 2  10.##.#.##        3.633ms 3.635ms
 3  10.##.#.##         3.618ms 3.624ms 3.627ms 3.628ms 3.632ms 3.633ms 3.636ms 3.637ms 5.899ms 5.9ms   5.904ms 5.905ms 7.306ms 7.307ms 8.154ms 8.155ms
 4  *
 5  *
 6  *
 7  *
 8  10.##.#.##     28.763ms        28.764ms        30.612ms        30.614ms        30.636ms        30.638ms        31.973ms        32.041ms        32.365ms        32.498ms        34.53ms 34.536ms        35.777ms        35.784ms
 9  *
10  *
11  *