Loading/unloading vmw_pvrdma can cause a guest OS crash in ESXi 7.0GA Server.
book
Article ID: 311892
calendar_today
Updated On:
Products
VMware vSphere ESXi
Issue/Introduction
To avoid this issue when hot-add/remove PVRDMA on ESXi 7.0GA Server.
Following Linux guest OS is affected.
Oracle Linux 8.8 with UEK kernel uek-5.15.0-101.103.2.1
Symptoms: PVRDMA hot-add/remove failure on guest OS was because the hot-add/remove action caused a kernel crash, and then guest OS was rebooted.
Environment
VMware vSphere 7.0.x
Cause
When guest driver is unloaded, VRDMA_EVENT_PORT_ERR gets triggered by the PVRDMA backend. Due to the guest driver flow, this causes an ib_dispatch_event to be generated on an inactive device which leads to access of register that is unmapped. The VRDMA_EVENT_PORT_ACTIVE on GID register and VRDMA_EVENT_PORT_ERR on GID unregister are actually not needed, and the HCA driver code doesn't do it either. Further, a port can be active without a GID associated with it.
Resolution
Upgrade ESXi 7.0GA to higher ESXi version.
Workaround: Do no trigger VRDMA_EVENT_PORT_ACTIVE/VRDMA_EVENT_PORT_ERR on GID register/unregister.