After a link failover in client pods SCTP INIT from pods or worker nodes are not forwarded by SNAT on Tier-1 Edge node to external Diameter Relay Agent
search cancel

After a link failover in client pods SCTP INIT from pods or worker nodes are not forwarded by SNAT on Tier-1 Edge node to external Diameter Relay Agent

book

Article ID: 398031

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

The topology is Diameter client pod --> WorkerNode VM --> ESXi --> Edge VM1 Tier1 (where SNAT occurs) --> Edge VM 2 Tier 0 --> External Diameter Relay Agent 

Initial diameter session establishment from all client pods to external Diameter Relay Agent is successful.

When a link failover occurs on one of the pod, it takes ~50 secs for next successful diameter session establishment for the client pods. 

The SCTP INIT from worker nodes are not forwarded by SNAT on Tier-1 to external peer after a link failover in pods 

The SNAT configured is as below where Source IP is set to Any Destination IP : Diameter Relay Agent IP with one Translated IP.  (As 3 DRA were present, 3 different SNAT rules were configured).

 

Environment

4.2.1.3.0.24533894

Cause

The reason is only when the "session/connection" in the Edge VM corresponding to the old SCTP session is deleted, the connection from new IP is successful.

The default timeout to delete the old inactive SCTP connection is 30 secs for non TCP/UDP/ICMP protocols. So it takes ~50 secs to establish a new SCTP session and there by successful diameter session establishment 




Resolution

The solution is to create a unique SNAT rule for every worker node translated to a dedicated translated IP, which is to create a 1:1 SNAT rule for Source IP : Translated IPs