Avi Service Engine crash with Segmentation fault at se_dp panic | tcp_output_full
search cancel

Avi Service Engine crash with Segmentation fault at se_dp panic | tcp_output_full

book

Article ID: 394861

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

Service Engines may crash when DNS over TCP when payload is large (16k) sent towards the client upon receiving an acknoledgement when lossy network environment.

The crash stack trace will include the following function(s) present in initial #0 method calls:

#0 panic
#2 tcp_output_full
#3 tcp_output

 

Sample StackTrace(s):

#0  panic (fmt=fmt@entry=0x557639c4ebb3 "%s: beyond sb") 
#1  0x00005576398539dd in sbsndptr (sb=sb@entry=0x55763eb57c14, off=off@entry=12330, len=len@entry=1370, moff=moff@entry=0x7ffdee72a02c) at 
#2  0x00005576397a0ca1 in tcp_output_full (tp=0x55763e843ba0) at
#3  0x00005576397a2a9d in tcp_output (tp=0x55763e843ba0) at
#4  0x0000557639770646 in sosend_generic (so=<optimized out>, addr=<optimized out>, top=<optimized out>, control=0x0, flags=<optimized out>) at

 

To investigate further, you can review the latest stack traces from the Controller or SE by accessing the following path:

CLI:

Login to Controller via ssh and run this command.Please note you have to replace the name of se_dp file here.

root@<Controller ip>:#  cat /opt/avi/archive/stack_traces/<se_dp.timestamp>.stack_trace
 
UI:
Navigate to Administration > Support > Crash Reports > Expand the latest crash file.

Environment

Affects Version(s):

22.1.x
30.1.x
30.2.1, 30.2.2, 30.2.3
31.1.1

Cause

This issue is cause when SE received DNS over TCP data. Client partially ACK'd, leading SE to clear its sender buffer without adjusting for suspected last-mile packet drop.

For the crash to occur there are a lot of conditions that must be met:   

1. DNS over TCP
2. A very large response coming from a backend (in the crash we saw the response to be at least 16K bytes)
3. The TCP connection between client and the SE is in congestion recovery state. This means there are network drops and the connection is recovering from the drops now.

Note: This particular problem is not applicable to Non-DNS L4 VS or L7 VS

Resolution

Please upgrade the system to the fix version.
AV-185290: Defensive fix to prevent SE from crashing on DNS over TCP when payload is large in a lossy network environment 
Fix Version(s): 30.2.4, 31.1.2 & 31.2.1
 
Workaround(s):

Workaround: 
Change the application profile of DNS Virtual Service to Application Profile: System-L4-Application
Caveat: With workaround 1, you'll lose DNS-related logs.