Hostd crashes repeatedly after installing Nvidia VIBs in ESXi 8.0
search cancel

Hostd crashes repeatedly after installing Nvidia VIBs in ESXi 8.0

book

Article ID: 320581

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
Hostd process crashes repeatedly after installing Nvidia VIBs.

/var/run/log/hostd.log on the ESXi host contains the following entries:
YYYY-MM-DDTHH:mm:ss Er(163) Hostd[20459428]: [Originator@6876 sub=Hostsvc] Feature capability "svga0*svga.basecapslevel" values 7 and 9 differ
YYYY-MM-DDTHH:mm:ss Er(163) Hostd[20459428]: [Originator@6876 sub=Hostsvc] Feature capability "svga0*svga.maxpointsize" values 63 and 189 differ
YYYY-MM-DDTHH:mm:ss Er(163) Hostd[20459428]: [Originator@6876 sub=Hostsvc] Feature capability "svga0*svga.maxtexturesize" values 8192 and 32768 differ
YYYY-MM-DDTHH:mm:ss Er(163) Hostd[20459428]: [Originator@6876 sub=Hostsvc] Feature capability "svga0*svga.maxvolumeextent" values 2048 and 16384 differ
YYYY-MM-DDTHH:mm:ss Er(163) Hostd[20459428]: [Originator@6876 sub=Hostsvc] Feature capability "svga16*svga.basecapslevel" values 7 and 9 differ
YYYY-MM-DDTHH:mm:ss Er(163) Hostd[20459428]: [Originator@6876 sub=Hostsvc] Feature capability "svga16*svga.maxpointsize" values 63 and 189 differ
YYYY-MM-DDTHH:mm:ss In(166) Hostd[20460760]: - time the service was last started YYYY-MM-DDTHH:mm:ss, Section for VMware ESX, pid=20460760, version=8.0.1, build=22088125, option=Release


Environment

VMware vSphere ESXi 8.0
VMware vSphere ESXi 8.0.1

Cause

This can happen when a host in EVC cluster has a graphics device in "SHARED" mode (vSGA). The graphics EVC mode is not applied correctly due to a bug introduced in ESXi 8.0u1.

Resolution

This will be resolved in a future release of ESXi 8.0 U2.

Workaround:
In some cases, the device should be in "SHARED_PASSTHRU" mode (vGPU) rather than "SHARED" mode. If so, we can work around this issue by placing the device in "SHARED_PASSTHRU" mode. To remove the "SHARED" config on the Nvidia devices by executing the following commands in an SSH session on all affected ESXi hosts:

  1. configstorecli config current get -c esx -g graphics -k devices
  2. configstorecli config current delete -c esx -g graphics -k devices --all
  3. localcli graphics host set --default-type SharedPassthru
  4. reboot

This should place the device(s) in vGPU mode and workaround the problem.

Additional Information

Impact/Risks:
Hostd repeatedly crashing will cause the host to disconnect from vCenter, and users will be unable to perform some basic tasks such as create new VMs, vMotion existing VMs, etc.