Avi Service Engine crash with segmentation fault at lj_debug**
search cancel

Avi Service Engine crash with segmentation fault at lj_debug**

book

Article ID: 396425

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

Service Engines may crash when a virtual service is configured with a data script utilizing variable in avi.pool.select() function.

Example Data script:

avi.vs.log("example data script")
example_path = avi.http.get_path_tokens(1,10)
if example_path then
  avi.vs.log("example script:" .. example_path)
  avi.pool.select(example_path)
end

Virtual Service Client log error: The connection will fail with an HTTP 500 response code with significance "Datascript failed to execute"

Client Log View All Headers: The data script traceback message will contain "First argument resolves to a pool/poolgroup named <object name> pool not found"

The crash stack trace will include the following function(s) present in initial #0 method calls:

lj_debug_frame
lj_debug_shortname
lj_debug_getinfo

Sample StackTrace(s):

To investigate further, you can review the latest stack traces from the Controller or SE by accessing the following path:

CLI:

Login to Controller via ssh and run this command.Please note you have to replace the name of se_dp file here.

root@<Controller ip>:#  cat /opt/avi/archive/stack_traces/<se_dp.timestamp>.stack_trace
 
UI:
Navigate to Administration > Support > Crash Reports > Expand the latest crash file.

Environment

Affects Version(s):

22.1.1 - 22.1.1-2p6

22.1.2 - 22.1.2-2p7

22.1.3 - 22.1.3-2p14

22.1.4 - 22.1.4-2p7

22.1.5 - 22.1.5-2p5

22.1.6 - 22.1.6-2p8

30.1.1

30.1.2 - 30.1.2-2p2

30.2.1

Cause

This issue is caused by a product defect with the avi.pool.select() function using variables during a virtual service configuration update and while there is an open connection.

Condition/Triggers(s) for the crashes:
  • When the VS TLS key rotation update occurs while existing connections are open, any Data Scripts processing avi.pool_select() may fail to fetch the pool references leading to the SE crashes.

    Example Key rotation update to the virtual service:

    /var/lib/avi/log/jobmanager.INFO

    0425 05:00:04.970083    I  7745      jobmanager/jobmanager.go:2529    Detected ControllerProperties change for job: JOB_TYPE_VS_ROTATE_KEYS
    0425 05:00:04.974318    I  19554      jobmanager/jobmanager.go:2588    Forcing the following job types because ControllerProperties changed: [JOB_TYPE_VS_ROTATE_KEYS]
    0425 05:00:05.199921    I  24541      jobmanager/jobmanager.go:368    Virtual service: Calling RotateKeys RPC    {"virtualServiceName": "EXAMPLE_VS", "virtualServiceUuid": "virtualservice-UUID"}
    0425 05:00:05.421886    I  7738      jobmanager/jobmanager.go:379    Virtual service: RotateKeys RPC Completed    {"virtualServiceName": "EXAMPLE_VS", "virtualServiceUuid": "virtualservice-UUID"}


  • Any other configuration update to the Virtual Service object with data scripts containing avi.pool_select() while the VS has open connections may crash.

Resolution

Please upgrade or patch the system to the fix version.

AV-206581: Using a variable in avi.pool.select() may fail to identify the pool during a virtual service update.
 
Fix Version(s): 22.1.5-2p6, 22.1.7, 30.2.1-2p1, 30.2.2, 31.1.1
 

Workaround(s):

  • Disable VS TLS key rotation for the time being to reduce the number of config updates to the virtual service(s).

    CLI Commands:

    > configure controller properties
    > vs_key_rotate_period 0
    > save


    Default value is set to 360 minutes (6 hours).

    **Note** 
    - This is a global change, it affects all virtual services on all tenants
    - Making this change does open a security risk as the TLS encryption key is not rotated
    - Making this change may cause a data impact as this change forces a VS key rotation which can trigger a crash
    - The timer for this rotation resets to the new value when changed 

  • Prevent any other configuration updates on the virtual service(s) with the referenced data scripts containing avi.pool_select()