Configuring NVidia GPU card shows error in ESX CLI and host logs
search cancel

Configuring NVidia GPU card shows error in ESX CLI and host logs

book

Article ID: 403637

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

When you try to enable  SR-IOV for the GPU card you see following in ESX CLI

[root@ESXi:~] nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

 

In host logs you see following

All logs are located in /var/run/log

hostd.log


YYYY-MM-DDTHH:MM:SS.SSSSZ In(166) Hostd[2100012] [Originator@6876 sub=Libs] NvmlUser: nvmlInit error code: 28
YYYY-MM-DDTHH:MM:SS.SSSSZ In(166) Hostd[2100012] [Originator@6876 sub=Libs] NvidiaVgpuInfo: Failed to open nvidia library
YYYY-MM-DDTHH:MM:SS.SSSSZ Wa(164) Hostd[2100012] [Originator@6876 sub=Libs] NvidiaDeviceGroupInfo: vgpuInfo not available.

vmkwarning.log

YYYY-MM-DDTHH:MM:SS.SSSSZ Db(15) esxupdate[2107898] Output: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

 In syslog I see this 

YYYY-MM-DDTHH:MM:SS.SSSSZ Er(11) nvidia-vgpud[2100477] error: failed to allocate client: 59
YYYY-MM-DDTHH:MM:SS.SSSSZ Er(11) nvidia-vgpud[2100477] error: failed to read pGPU information: 9
YYYY-MM-DDTHH:MM:SS.SSSSZ Er(11) nvidia-vgpud[2100477] error: failed to send vGPU configuration info to RM: 9

 

Environment

vSphere 8.x

Cause

Incorrect driver version was causing this behavior.

 

Resolution

For obtaining correct drivers for your GPU please contact NVidia.