PCIe hotplug: ESX host may crash when PCIe NVMe device(s) surprise hot removed and hot inserted back quickly ( < 1 minute)
book
Article ID: 312022
calendar_today
Updated On:
Products
VMware vSphere ESXi
Issue/Introduction
Symptoms: Under certain surprise hot removal scenario, VMware native NVMe hot plug might cause PSOD if a NVMe drive is pulled out and reinserted back within one minute. This is applicable to vSphere as well as vSAN deployment for new as well as existing drive reinsertion.
After an NVMe drive is physically removed from the server, it takes ESXi 1 minute to clean up the resources allocated for the drive. In between, ESXi may still try to access the removed drive and trigger a non-maskable interrupt (NMI) from the server, leading to a PSOD in ESXi.
Resolution
Currently there is no resolution.
Workaround: To workaround this issue, wait for 1 minute or longer and reinsert (hot plug) the new or existing NVMe drive back into the PCIe slot.