AVI Controller memory utilization increasing over time due to remote task manager
search cancel

AVI Controller memory utilization increasing over time due to remote task manager

book

Article ID: 380459

calendar_today

Updated On:

Products

VMware NSX Advanced Load Balancer

Issue/Introduction

The memory utilization of the controller can increase over time:

 

From the CLI you can check what is consuming the memory with: top -o %MEM

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

1577705 root      20   0   17.0g  14.2g  23632 S   0.0  30.1  27303:53 remote_task_man

 

 

If remote task manager is taking most of the memory you can double check the logs:

zgrep /var/lib/avi/log/remote_task_manager* | wc -l

1 node of the cluster will have most of the queries.

 

This can happen if API calls being made to the controller for the following are present:

/api/version/se

which can be checked under /var/log/nginx/portal.access.log* with the following command:

zgrep "/api/version/se" portal.access.log* | wc -l

You will see how many queries are being made.

 

In order to see who are the actual clients doing those queries you can use:

zgrep "/api/version/se" portal.access.log* | more

 

You will see queries like this:

portal.access.log:<Client IP goes here> [cache:-] 127.0.0.1:6000 [-] - - [03/Oct/2024:05:36:30 +0000] [-] [-] "GET /api/version/se HTTP/1.1"

Cause

GET API for version/se translates to 3 remote tasks through RTM to all SEs to get their version, patch_version and fips_mode. There is goroutine leak found in RTM server implementation and as there are remote tasks being executed over time the number of leaked goroutines increases.

Resolution

Workaround: Stop the client from making those API calls to AVI, after that reboot the controller to release the consumed memory.

A proper fix will be delivered under 22.1.8 and 30.2.2 under this ID: AV-219698