HCX Network Extension Tunnel Down Due to Palo Alto Firewall Session Issues
search cancel

HCX Network Extension Tunnel Down Due to Palo Alto Firewall Session Issues

book

Article ID: 398282

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

When utilizing VMware HCX with Palo Alto firewalls in the network path, HCX Network Extension (NE) tunnels may unexpectedly go down despite no configuration changes being made to the firewall or network. The following symptoms may be observed:

  • HCX Network Extension appliances show a status of "DOWN" in the Service Mesh view
  • Error message "HCX Interconnect Service Mesh NE Tunnel State Change" appears in alerts
  • From HCX Manager -> Administration -> Alerts, the status reports: "Overall transport tunnel status is down"
  • When viewing the Network Extensions in the HCX Manager UI, you may observe some tunnels showing as "UP" going to port 4500, while others appear as "DOWN" going to the same port 4500. This mixed status indicates underlying network connectivity exists, but points specifically to session-related issues in the firewall rather than a complete network outage

Environment

  • VMware HCX 4.x
  • Palo Alto firewalls (all hardware and VM platforms)
  • Network configurations using IPsec VPN tunnels
  • Environments where traffic between HCX appliances traverses Palo Alto firewalls

Cause

The issue is caused by a known behavior in Palo Alto firewalls regarding session handling for UDP traffic (including IPsec ESP packets, which HCX uses for its tunnels).

Palo Alto firewalls create and use session records while processing traffic. Sessions can exist in various states, including "Discard" state. When a session is in "Discard" state, any packet that hits that session is dropped by the firewall. In some scenarios, sessions may become stuck in "Discard" state or not properly transition states when expected.

Two common scenarios that affect HCX tunnels:

  1. When an IPsec tunnel is restarted or after a system restart, ESP packets may continue to hit a session in "Discard" state, preventing the tunnel from re-establishing
  2. In some cases, sessions that should be in "Discard" state may appear as "active" instead, causing similar issues where traffic is not properly processed

This behavior can occur even when there have been no configuration changes to the firewall or network. This is seen especially in environments where there is double encryption causing MTU issues. 

Resolution

Step 1: Ensure Proper MTU Configuration

Verify that your HCX environment has proper MTU configuration to ensure stability. Follow the guidance in Configuring MTU for VMware HCX Components and Infrastructure to ensure proper MTU settings for your environment.

If you are using encryption for a VPN over the Palo Alto or upstream router please consider disabling Encryption on your network profile if your network is already secure.

Step 2a: Identify and Clear Problematic Sessions

  1. Log in to the Palo Alto firewall via CLI
  2. Identify sessions in "Discard" state (or review all sessions between the HCX appliances):
> show session all filter source [HCX NE IP 1] destination [HCX NE IP 2] state discard
> show session all filter source [HCX NE IP 2] destination [HCX NE IP 1] state discard
> show session all filter state discard
  1. If no sessions are found in "Discard" state, check all sessions between the HCX appliances:
> show session all filter source [HCX NE IP 1] destination [HCX NE IP 2]
> show session all filter source [HCX NE IP 2] destination [HCX NE IP 1]
  1. Clear the identified sessions using one of these methods:

Option A: Clear individual sessions

> clear session id [Session Id]

Option B: Clear all sessions matching specific criteria (more efficient for multiple sessions)

> clear session all filter source [HCX NE IP 1] destination [HCX NE IP 2]
> clear session all filter source [HCX NE IP 2] destination [HCX NE IP 1]
  1. For widespread issues, you can clear all sessions in a specific state:
    > clear session all filter state discard
  2. Verify tunnel recovery by checking the HCX tunnel status and confirming traffic flow has resumed.

Additional Information

To prevent this issue from recurring:

  1. Consider increasing the UDP timeout value for the concerned application IDs on the Palo Alto firewall:
    • Navigate to Objects > Applications > [Concerned App-id] on the WebUI
    • Increase the UDP Timeout (Second) value
  2. Define allow rules for both directions of traffic between the HCX appliances to ensure return traffic can properly establish sessions.

If the error persists after following these steps, contact Palo Alto support for further assistance.

Related Article: