Avi controller events not loading for virtual services and pools
search cancel

Avi controller events not loading for virtual services and pools

book

Article ID: 389734

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

The virtual service or pool events in any display time will fail to load and result in a request timeout.  The API endpoint which fetches the client logs and system events will fail with a 500 Internal Server Error.

The HTTP 500 response code can be identified by using the web browser's inspect element and switching to the network tab.

Example:

 

Environment

Affects all versions running the Avi controller as a docker container.

Cause

This issue is caused by a low container memory threshold causing the controller to continuously restart services on the Avi controller.   This in turn causes a high amount of event files, which leads to the GUI events timeouts.

***NOTE***: The log snippets below are examples and may vary from environment to environment.

/var/lib/avi/log/stats_collection.log

Container Memory - Avi registers a high memory consumptions although the available memory on the host is much more.

└─$ grep 'docker mode mem' stats_collection.log | sort | tail -n 4
[2025-01-16 19:52:55,934] INFO [stat_collector._get_memory_stats_docker:1833] docker mode memory: svmem(Total: 60.0G, Used: 48.5G, Available: 11.5G Percent: 80)
[2025-01-16 19:53:21,032] INFO [stat_collector._get_memory_stats_docker:1833] docker mode memory: svmem(Total: 60.0G, Used: 48.5G, Available: 11.5G Percent: 80)
[2025-01-16 19:53:46,125] INFO [stat_collector._get_memory_stats_docker:1833] docker mode memory: svmem(Total: 60.0G, Used: 48.6G, Available: 11.4G Percent: 80)
[2025-01-16 19:54:11,236] INFO [stat_collector._get_memory_stats_docker:1833] docker mode memory: svmem(Total: 60.0G, Used: 48.6G, Available: 11.4G Percent: 80)

Host Memory:

└─$ grep 'host mem' stats_collection.log | sort | tail -n 4
[2025-01-16 19:52:55,730] INFO [stat_collector.pprint_ntuple:216] host memory:  Total : 125.9G, Available : 86.6G, Percent : 31.2%, Used : 31.8G, Free : 24.6G, Active : 48.3G, Inactive : 26.7G, Buffers : 2.0G, Cached : 67.5G, Shared : 5.0G, Slab : 23.6G
[2025-01-16 19:53:20,829] INFO [stat_collector.pprint_ntuple:216] host memory:  Total : 125.9G, Available : 86.6G, Percent : 31.2%, Used : 31.8G, Free : 24.6G, Active : 48.3G, Inactive : 26.7G, Buffers : 2.0G, Cached : 67.5G, Shared : 5.0G, Slab : 23.6G
[2025-01-16 19:53:45,914] INFO [stat_collector.pprint_ntuple:216] host memory:  Total : 125.9G, Available : 86.6G, Percent : 31.2%, Used : 31.8G, Free : 24.6G, Active : 48.4G, Inactive : 26.6G, Buffers : 2.0G, Cached : 67.5G, Shared : 5.0G, Slab : 23.6G
[2025-01-16 19:54:11,016] INFO [stat_collector.pprint_ntuple:216] host memory:  Total : 125.9G, Available : 86.6G, Percent : 31.2%, Used : 31.8G, Free : 24.5G, Active : 50.5G, Inactive : 24.6G, Buffers : 2.0G, Cached : 67.6G, Shared : 5.0G, Slab : 23.6G

High number of statecachemgr daemon restarts

/var/lib/avi/log/cluster_manager*INFO*

└─$ zgrep 'Restarting avi-statecachemgr' cluster_manager*INFO* | wc -l
1373

High number of event files:

Resolution

The memory threshold for docker container controllers will be increased to prevent such issues where system daemons are restarted.  This fix will be added in the next releases of Avi, please check the product release notes.

Bug ID: AV-227721 - Events not available for some Virtual Services and Pools, due to high memory consumption.

Fix Version(s): 22.1.7-2p6, 30.2.2-2p4, 30.2.3

Release Notes for VMware Avi Load Balancer Version 30.2.3

Workaround(s):

  1. Change the cache drop parameters on the Avi controller hosts.  This will reduce the cache memory utilization on the host allowing for the memory utilization tracked by Avi to lower.

    Run the following commands on the controller hosts: (all three hosts if a controller cluster)

     sudo echo 150 > /proc/sys/vm/vfs_cache_pressure
     
    --- default value is set to 50

    sudo echo 3 > /proc/sys/vm/drop_caches

    ---- default value is set to 0

  2. Clean up the high amount of event files.

    a. ssh to all three controller nodes and enter the controller containers

    sudo docker exec -it avicontroller bash


    b. stop process supervisor on the follower nodes first then the leader

    sudo systemctl stop process-supervisor


    c. change directory to /var/lib/avi/logs/ALL-EVENTS

    d. Identify the number of statecachemgr logs you can use the following command:

    find . -type f -name '*statecachemgr*' | wc -l


    e. Remove the files with the following command

    find . -type f -name '*statecachemgr*' -exec rm -rf {} \;


    f. Start process supervisor on the leader node first then the follwer nodes

    sudo systemctl start process-supervisor

**NOTE**

  • Step 2: This step is still required to perform to recover the system even after applying the fix version.  The fix addresses the burst of statecachemgr files created by the system due to the system daemons are restarts.
  • Step 1: The cache drop parameters can be reverted to the default values after applying the Avi fix version.