Linux virtual machines with PCI passthru devices may crash on AMD servers due to heavy IOs
search cancel

Linux virtual machines with PCI passthru devices may crash on AMD servers due to heavy IOs

book

Article ID: 440442

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • Linux virtual machines with PCI passthru devices running on AMD servers may experience soft lockup and guest crash when running heavy IO workload.
  • On the virtual machine at the terminal or in the /var/log/messages file, the following message appears
    kernel:watchdog: BUG: soft lockup - CPU#Y stuck for Xs! where Y is one of the CPU cores and X is an amount of time in seconds.

Environment

  • ESXi 7.x
  • ESXi 8.x
  • ESX 9.x

Cause

On virtual machines with PCI passthru devices on AMD servers, due to certain limitations, virtual interrupts are not always delivered to the correct VCPU, causing some vCPUs to be busy routing interrupts, and eventually leading to a guest soft lockup.

Resolution

For IO intensive workloads with PCI passthru devices on AMD servers, there are currently no viable workarounds. Broadcom is working on potential solutions for this issue, but there is no fixed timeline yet for resolution.