Symptoms:
Hostd process crashes repeatedly after installing Nvidia VIBs.
/var/run/log/hostd.log on the ESXi host contains the following entries:
YYYY-MM-DDTHH:mm:ss Er(163) Hostd[20459428]: [Originator@6876 sub=Hostsvc] Feature capability "svga0*svga.basecapslevel" values 7 and 9 differ
YYYY-MM-DDTHH:mm:ss Er(163) Hostd[20459428]: [Originator@6876 sub=Hostsvc] Feature capability "svga0*svga.maxpointsize" values 63 and 189 differ
YYYY-MM-DDTHH:mm:ss Er(163) Hostd[20459428]: [Originator@6876 sub=Hostsvc] Feature capability "svga0*svga.maxtexturesize" values 8192 and 32768 differ
YYYY-MM-DDTHH:mm:ss Er(163) Hostd[20459428]: [Originator@6876 sub=Hostsvc] Feature capability "svga0*svga.maxvolumeextent" values 2048 and 16384 differ
YYYY-MM-DDTHH:mm:ss Er(163) Hostd[20459428]: [Originator@6876 sub=Hostsvc] Feature capability "svga16*svga.basecapslevel" values 7 and 9 differ
YYYY-MM-DDTHH:mm:ss Er(163) Hostd[20459428]: [Originator@6876 sub=Hostsvc] Feature capability "svga16*svga.maxpointsize" values 63 and 189 differ
YYYY-MM-DDTHH:mm:ss In(166) Hostd[20460760]: - time the service was last started YYYY-MM-DDTHH:mm:ss, Section for VMware ESX, pid=20460760, version=8.0.1, build=22088125, option=Release
VMware vSphere ESXi 8.0
VMware vSphere ESXi 8.0.1
This can happen when a host in EVC cluster has a graphics device in "SHARED" mode (vSGA). The graphics EVC mode is not applied correctly due to a bug introduced in ESXi 8.0u1.
This will be resolved in a future release of ESXi 8.0 U2.
Workaround:
In some cases, the device should be in "SHARED_PASSTHRU" mode (vGPU) rather than "SHARED" mode. If so, we can work around this issue by placing the device in "SHARED_PASSTHRU" mode. To remove the "SHARED" config on the Nvidia devices by executing the following commands in an SSH session on all affected ESXi hosts:
This should place the device(s) in vGPU mode and workaround the problem.