Timeout guidelines for an IPSec tunnel from a Palo Alto firewall to WSS

Products

Cloud Secure Web Gateway - Cloud SWG

Issue/Introduction

Purpose of this document is to provide information on using timeouts for an IPSec tunnel confguration from a Palo Alto firewall to WSS. The timeout values listed in this document were tested in a test environment with a Palo Alto firewall running PANOS 8.1.0.

This document assumes that the tunnel configuration uses PBF policy to forward interesting traffic into the IPSec tunnel.

We will discuss three pieces of configuration where timeouts will be set for the tunnel.

- DPD (Dead Peer Detection)

- Tunnel monitor

- PBF monitor

Resolution

DPD

This timeout is used to determine the liveliness of the IKE_SA. On a Palo Alto firewall DPD is not persistent and the DPD process is initiated when a rekey happens.

Tunnel Monitor

A monitor profile which will send ICMP packets through the tunnel. This timeout will determine when traffic through the tunnel becomes non-responsive. if the tunnel monitor fails it will trigger a rekey.

PBF Monitor

A monitor profile with different timeouts than the monitor profile used for the tunnel. This monitor will send ICMP packet outside the tunnel to the VPN peer. If this monitor fails the PBF will be disabled which allows the next PBF policy to be used resulting in traffic being sent through the secondary IPSec tunnel.

There are two failover scenarios when establishing an IPSec tunnel with WSS:

1 - Data pod failover: this occurs when a pod, a tunnel is established with, is taken out of rotation for any reason and the IPSec traffic (ISAKMP and ESP) are now forwarded to a different active pod. When this happens WSS will become non-responsive until the tunnel is reset by the Palo Alto firewall.

2 - Data center failover: This occurs when the WSS data center becomes unreachable. The Palo Alto firewall needs to determine that the data center is unreachable and fails traffic over to the secondary IPSec tunnel which should be established to a different WSS data center.

In a data pod failover scenario it is ideal to keep the users in the same data center. The timeouts listed below from a test environment would not "bounce" the traffic over to the secondary tunnel and back to the primary tunnel.

Sequence of events in a data pod failover scenario.

1 - WSS becomes non-responsive

2 - tunnel monitor detects the failure and triggers a rekey

3 - the rekey triggers DPD probes

4 - DPD detects a failure and reinitializes the tunnel

5 - tunnel now established to a different data pod in WSS

Sequence of events in a data center failover scenario.

1 - WSS becomes non-responsive

2 - tunnel monitor detects the failure and triggers a rekey

3 - the rekey triggers DPD probes

4 - DPD detects a failure and tries to reinitialize the tunnel

5 - PBF monitor detects a failure and disables the PBF

6 - traffic is now hitting the next priority PBF policy to route traffic over secondary tunnel

7 - when PBF monitor succeeds on the higher priority PBF the policy is enabled

8 - traffic is routed back through the primary tunnel

Timeouts.

The following timeouts are a guideline to begin with and may need adjusting on a case by case situation. These timeouts were successful at keeping traffic in the primary tunnel and it was seen that the tunnel recovered in less than 30 seconds.

DPD:

interval 2 seconds

retry 3 seconds

Tunnel monitor:

interval 5 seconds

threshold 3 seconds

action = failover

PBF monitor:

interval 9 seconds

threshold 6 seconds

action = failover

Testing:

It is recommended that the changes are tested after they are committed. To test the data pod failover scenario Broadcom support must be involved to help move traffic to another active pod.

To test data center failover traffic to the primary data center must be black holed upstream from the Palo Alto for a conclusive test. Another option stated by a Broadcom customer is to change the PBF monitor to an unreachable IP address. This will test that the PBF monitor and PBF policy is working.