Avi Load Balancer Service Engines Upgrade failed due to Insufficient Disk Space
search cancel

Avi Load Balancer Service Engines Upgrade failed due to Insufficient Disk Space

book

Article ID: 407776

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

During a Service Engine (SE) upgrade to versions 30.2.3 or 30.2.4, the SE upgrade may fail and cause the SE to enter a suspended state. This is primarily due to a lack of available disk space, which is caused by a change in how application traffic logs copy to new partition handled during the upgrade process.

Environment

  • Avi Load Balancer Service Engines upgrade to version 30.2.3 or 30.2.4 with a large volume of application traffic logs in the /var/lib/avi/log/traffic directory.

Cause

  • In Avi Load Balancer versions 30.2.3 and 30.2.4, the upgrade process to these versions was modified to copy application traffic logs from the current partition to the new partition, instead of moving them , like in the previous releases 22.x, 30.1.x, 30.2.1, 30.2.2.
  • This change was meant to preserve logs and clean it up later but it can lead to disk space exhaustion while copy from current partition to new partition during the upgrade, especially in scaled environments where Application Traffic logs are large.
  • The copy operation requires double the disk space temporarily, which can halt the upgrade and cause the SE to fail.

Resolution

To prevent Service Engine upgrade failures, you can either increase the disk size or clean up existing logs before upgrading to versions 30.2.3 or 30.2.4.

Solution 1: Increase Service Engine Disk Size
Increase the disk size of the Service Engine.
For detailed instructions, refer to the following Broadcom knowledge base article: https://knowledge.broadcom.com/external/article/400695/

Solution 2: Delete Old Application Traffic Logs

  • Use the se-cleanup.sh script to clean up application traffic logs from /var/lib/avi/traffic/ directory on the SE.
  • Please note that script tries to reduce the application traffic logs from the above mentioned directory in the current partition to 40% its size to ensure that when it is copied from current partition to new partition during upgrade operation on service engine, the upgrade does not fail.
  • It starts deleting files starting with files older than a month. This loop goes on until logs created 1 day before if required.
  • This speeds up the upgrade-filecopy step as well and reduces the chance of disk space issues during the SE upgrade. 
  • Please follow the steps below:
  • Download the se-cleanup.sh script from the attachment provided with this article.
    1. Copy the script to the /tmp/ directory on the Avi Controller Leader Node.

    2. Execute the following command on the Controller Leader Node to run the script on all connected SEs:

      $ sudo -i
      # cd /tmp/
      
      >> Dry-Run:
      # sudo -u postgres psql -d avi -p 5000 -c "select ip from api_securechannelmapping;" | sed '1,2d;$d' | sed '$d' | tr -d ' ' | xargs -I {} sh -c "echo {};cat se-cleanup.sh | ssh -o ConnectTimeout=5 -q -i /etc/ssh/id_se aviseuser@{} /bin/bash -s list"
      
      >> Delete:
      # sudo -u postgres psql -d avi -p 5000 -c "select ip from api_securechannelmapping;" | sed '1,2d;$d' | sed '$d' | tr -d ' ' | xargs -I {} sh -c "echo {};cat se-cleanup.sh | ssh -o ConnectTimeout=5 -q -i /etc/ssh/id_se aviseuser@{} /bin/bash -s y"
      Note: This command connects to each Service Engine via SSH and executes the script to clean up application traffic logs.
    3. Sample script runs look like below:
      Dry Run:
      
      root@Controller-IP:/tmp# sudo -u postgres psql -d avi -p 5000 -c "select ip from api_securechannelmapping;" | sed '1,2d;$d' | sed '$d' | tr -d ' ' | xargs -I {} sh -c "echo {};cat se-cleanup.sh | ssh -o ConnectTimeout=5 -q -i /etc/ssh/id_se aviseuser@{} /bin/bash -s list"
      
      <Service Engine IP Address>
      Enough disk space for a successful upgrade. Exiting.
      Required: 5120MB, Available: 14834MB
      <Service Engine IP Address>
      Enough disk space for a successful upgrade. Exiting.
      Required: 5120MB, Available: 5272MB
      <Service Engine IP Address>
      Enough disk space for a successful upgrade. Exiting.
      Required: 5120MB, Available: 5483MB
      <Service Engine IP Address>
      Initial: 11G, Target: 4096MB
      ---------------------------------------------------------
      Dry Run Mode Selected
      Performing a dry run
      This script will perform the following steps if executed with 'y':
      1. Check available disk space on the partition.
         - Required space = 2 x current traffic log size + 5GB buffer.
         - Available space = Free partition space + prev directory usage (if present).
      2. If enough disk space is available, script exits without deleting.
      3. If not enough space, it will start deleting files in /var/lib/avi/log/traffic/
         - Deletion starts from files older than 30 days and moves down day by day.
         - The goal is to reduce log directory size to 40% of current usage.
      4. During cleanup, it prints how many files would be deleted per day threshold.
      5. At the end, a summary will be shown:
         - Initial log directory size.
         - Target size (40% of initial).
         - Estimated freed space.
         - Estimated final size.
      ---------------------------------------------------------
      Now listing the files that *would* be deleted:
      /var/lib/avi/log/traffic/virtualservice-xxxx/log_app_adf_vs.test.123456
      Would delete 1 files older than 30 days.
      <Service Engine IP Address>
      Enough disk space for a successful upgrade. Exiting.
      Required: 5120MB, Available: 5254MB
      
      Deleting:
      
      root@Controller-IP:/tmp# sudo -u postgres psql -d avi -p 5000 -c "select ip from api_securechannelmapping;" | sed '1,2d;$d' | sed '$d' | tr -d ' ' | xargs -I {} sh -c "echo {};cat se-cleanup.sh | ssh -o ConnectTimeout=5 -q -i /etc/ssh/id_se aviseuser@{} /bin/bash -s y"
      
      <Service Engine IP Address>
      Enough disk space for a successful upgrade. Exiting.
      Required: 5120MB, Available: 14834MB
      <Service Engine IP Address>
      Enough disk space for a successful upgrade. Exiting.
      Required: 5120MB, Available: 5271MB
      <Service Engine IP Address>
      Enough disk space for a successful upgrade. Exiting.
      Required: 5120MB, Available: 5483MB
      <Service Engine IP Address>
      Initial: 14G, Target: 5325MB
      Deleted 1 files older than 30 days.
      Final: 2.1M, Freed: 13312MB
      <Service Engine IP Address>
      Enough disk space for a successful upgrade. Exiting.
      Required: 5120MB, Available: 5269MB

 

Additional Information

  • The Avi Load Balance UI Analytics for Virtual Service (VS) Application logs only displays data for one month. This behavior was introduced since Version 22.1.3 - To optimize performance, the options to generate logs based on time frame (
    Past Year
    ,
    Past Quarter
    , and
     All Time
    ) are removed from the Avi Load Balance VS Analytics UI
  • The se-cleanup.sh script is designed to run on all Service Engines connected to the Controller.
  • The provided script aligns with this by trimming logs older than one month, which is sufficient for most use cases and ensures a smoother upgrade.
  • It's good practice to verify the type of storage used by the Service Engine. SSH into the SE and execute the command
    # lsblk -o NAME,ROTA
    .
    A ROTA value of 0 indicates an SSD, while a value of 1 indicates an HDD.

Attachments

se-cleanup.sh get_app