Loading/unloading vmw_pvrdma can cause a guest OS crash in ESXi 7.0GA Server.
search cancel

Loading/unloading vmw_pvrdma can cause a guest OS crash in ESXi 7.0GA Server.

book

Article ID: 311892

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

To avoid this issue when hot-add/remove PVRDMA on ESXi 7.0GA Server.

Following Linux guest OS is affected.
  • Oracle Linux 8.8 with UEK kernel uek-5.15.0-101.103.2.1


Symptoms:
PVRDMA hot-add/remove failure on guest OS was because the hot-add/remove action caused a kernel crash, and then guest OS was rebooted.


Environment

VMware vSphere 7.0.x

Cause

When guest driver is unloaded, VRDMA_EVENT_PORT_ERR gets triggered by the PVRDMA backend. Due to the guest driver flow, this causes an ib_dispatch_event to be generated on an inactive device which leads to access of register that is unmapped. The VRDMA_EVENT_PORT_ACTIVE on GID register and VRDMA_EVENT_PORT_ERR on GID unregister are actually not needed, and the HCA driver code doesn't do it either. Further, a port can be active without a GID associated with it.

Resolution

Upgrade ESXi 7.0GA to higher ESXi version.

Workaround:
Do no trigger VRDMA_EVENT_PORT_ACTIVE/VRDMA_EVENT_PORT_ERR on GID register/unregister.