Avi VIP slowness in Azure Cloud Service Engines
search cancel

Avi VIP slowness in Azure Cloud Service Engines

book

Article ID: 392715

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

  • Users will begin to observe web pages take a long time to render/load and sometimes timeout.  In the virtual service client logs you will find that request(s) that normally take milliseconds to complete are now taking seconds or +1 hour to complete. 
  • The very high data transfer time and total time is the indicator in the client traffic logs.

Example Virtual Service Client logs:

Environment

Affects All Avi Versions with Azure Clouds

Cause

  • Service Engine instances are deployed on an Azure host running Connectx-3 NICs which do not support DPDK.  This causes the SE to boot in PCAP mode which greatly recuses performance.
  • Microsoft Azure is still using Mellanox Connectx-3 NICs on their hosts. 
  • Azure does not allow you to select a subset of hosts with mlx4 or mlx5 NIC which support DPDK (accelerated networking) and also does not provide information of which hosts use mlx3 NICs.

    How Accelerated Networking works in Linux and FreeBSD VMs

Resolution

  • Since placement of the VM on an Azure host is controlled by the Azure infrastructure, and support from Azure to identify the NICs on the Azure hosts is not possible Avi is not able to provide a solution for this issue at this time.

Workaround(s):

You can use the following steps to indetify if the Service Engine hosting the affected Virtual Services is running on PCAP mode and identify the NIC type. 

  1. ssh to the controller leader node with the admin user.

  2. Launch the CLI (Shell) by entering "shell" - login with the admin user

  3. Run command: show serviceengine <SE_NAME> interface | grep -i mode

    a. If the service engine is running in PCAP mode, the value of se_dpdk_mode will be set to "False" and you will also see pcap_tx_mode set to "PCAP_TX_RING" 



    b. If the service engine is running on DPDK mode, the value of se_dpdk_mode will be set to "True" and pcap_tx_mode will not be present.



  4. You can further confirm the NIC type by attaching to the service engine with command "attach serviceengine <SE_NAME>" then running command "sudo lspci | grep -i eth" 

    a. If the output of the command "sudo lspci | grep -i eth" shows "Connectx-3" then the Service Engine instance is running on an Azure host with unsupported DPDK NICs.



    b. If the out of the command "sudo lspci | grep -i eth" shows "Connectx-4" then the Service Engine instance is running on an Azure host with DPDK supported NICs.



  5. If the Service Engines is running in PCAP mode and with "Connectx-3" NICs, please proceed to delete the Sercice Engine and allow the controller to recreate a new one.  

    ***NOTE**: Redeploying the SE may have to be done multiple times due to the Azure host limitation.  

  6. In some scenarios the SE will show se_dpdk_mode as "False" but the interface is running "Connectx-4" from the lspci command.  This can occur if the SE instance is rebooted from the Azure console.  If this occurs please reboot the SE with the following steps:

    a. Attach to the service engine with command: "attach serviceengine <SE_NAME>"
    b. Restart the SE with command: sudo /opt/avi/scripts/restart_se.sh
    c. After the restart completes and the SE shows UP/connected on the GUI, use command "show serviceengine <SE_NAME> interface | grep -i mode" to verify se_dpdk_mode set to "True"