Service Engine Rebooting Due to vNIC Stall Event
search cancel

Service Engine Rebooting Due to vNIC Stall Event

book

Article ID: 391420

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

The Service Engine is rebooting at regular intervals due to a vNIC stall event. vNIC stalls can occur if there are issues in the underlying infrastructure or if the NIC card is running an incompatible software version.

To recover from a vNIC stall event, the Service Engine automatically reboots.

Environment

  • LSC(Linux Server Cloud) cloud
  • AVI Load Balancer Version: 22.1.x

Cause

On the Service Engine, vNIC stalls may happen due to an incompatibility between the NIC hardware version and the Data Plane Development Kit (DPDK) version of the AVI Load Balancer.

NIC Version verification commands:

To check the NIC version, follow these steps:

  • SSH into the host where the Service Engine is deployed.
  • Run the following command:
     ethtool -i <interface>

Replace <interface> with the actual network interface name to retrieve version details.

[test@linux-host~]$ ethtool -i em2
driver: ixgbe
version: 5.1.0-k-rh7.7
firmware-version: 0x800013cf, 20.0.16 #
expansion-rom-version:
bus-info: 0000:02:03.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

Version Compatibility Checking:

Ensure that the firmware version of the NIC is compatible with the AVI Load Balancer.

AVI Load Balancer 22.1.x uses DPDK 20.05 version

AVI Load Balancer 30.2.1 and later use DPDK 22.11 version.

You can verify NIC hardware compatibility using the following official DPDK release notes:

DPDK 20.05 Compatible NIC Hardware Versions:Release Notes for DPDK 20.05

DPDK 22.11 Compatible NIC Hardware Versions: Release Notes for DPDK 22.11

Log Verification:

Service engine rebooting to recover from TX stall

C01 22:05:40.817000 [ipstk_vnic_queue_stall_cb:6737] Restarting SE to recover from consecutive TX stalls
C01 22:05:40.817000 [system_ret:108] Invoking system command [avictl start  se_reboot.service SE_REBOOT_ARG1=se_power_off &]
C01 22:05:40.817000 [system_ret:113] Completed system command [avictl start  se_reboot.service SE_REBOOT_ARG1=se_power_off &]

Logs related to identify Driver compatibility issues on Service Engine

Dec 01 21:37:42 lb-## se[1531]: Core:0 12/01/24 20:37:42.657801 UTC i40e_dev_alarm_handler(): ICR0: malicious programming detected
Dec 01 21:37:42 lb-## se[1531]: Core:0 12/01/24 20:37:42.657876 UTC i40e_handle_mdd_event(): Malicious Driver Detection event 0x02 on TX queue 1 PF number 0x01 VF number 0x00 device 0000:e2:00.1#012
Dec 01 21:37:42 lb-## se[1531]: Core:0 12/01/24 20:37:42.657887 UTC i40e_handle_mdd_event(): TX driver issue detected on PF#012
Dec 01 22:04:39 lb-## se[1531]: Core:0 12/01/24 21:04:39.520653 UTC i40e_dev_alarm_handler(): ICR0: malicious programming detected
Dec 01 22:04:39 lb-## se[1531]: Core:0 12/01/24 21:04:39.520731 UTC i40e_handle_mdd_event(): Malicious Driver Detection event 0x02 on TX queue 1 PF number 0x00 VF number 0x00 device 0000:e2:00.0#012
Dec 01 22:04:39 lb-## se[1531]: Core:0 12/01/24 21:04:39.520741 UTC i40e_handle_mdd_event(): TX driver issue detected on PF#012
Dec 01 22:04:39 lb-## se[1534]: Core:1 12/01/24 21:04:39.783907 UTC EAL: [1738098266][298021] vnic tx queue stall timer start - vnic 0 - queue 0
205:Dec 01 17:58:32 lb-## kernel: [4258554.099267] i40e 0000:e2:00.0: The driver for the device detected a newer version of the NVM image v1.15 than expected v1.9. Please install the most recent version of the network driver.

 

Resolution

Please make sure the Underlying Hardware NIC version is compatible with the DPDK version of AVI Load balancer.