NVIDIA GPU not correctly detected after installing NVIDIA driver
search cancel

NVIDIA GPU not correctly detected after installing NVIDIA driver

book

Article ID: 402713

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESX 7.x VMware vSphere ESX 8.x

Issue/Introduction

  • The ESXi host has A100 GPU device installed.
  • nvidia-smi command returns with below error after installing NVIDIA driver: 

    NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

  • /var/run/log/vmkwarning.log has below error:

    ALERT: NVIDIA: module load failed during VIB install/upgrade.

  • On vSphere Client, one can see the GPU device under "Graphics Devices" but Memory of the device is showing as 0B.

Environment

VMware vSphere ESX 7.x
VMware vSphere ESX 8.x

Cause

NVIDIA A100 device requires NVIDIA Enterprise AI (NVIDIA-AIE) driver instead of Grid driver to be installed. 

This can also be confirmed from the compatibility guide: NVIDIA A100 80GB PCIe

Resolution

Remove the current driver and re-install NVIDIA-AIE driver which can be downloaded from NVIDIA website.

For more information regarding to installing and removing NVIDIA GPU vib please refer to: Installing and configuring the NVIDIA VIB on ESXi