Post PAIF RAG Deployment "rag-application-multiturn-chatbot", "nemollm-inference-microservice" and "nemo-retriever-embedding-microservice" containers are in exited state

Products

VCF Private AI Services VMware vRealize Automation 8.x

Issue/Introduction

Post PAIF RAG Deployment "rag-application-multiturn-chatbot", "nemollm-inference-microservice" and "nemo-retriever-embedding-microservice" containers are in exited state.

(base) vmware@rcvrag:~$ docker container ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3##########4 nvcr.io/nvidia/aiworkflows/rag-playground:24.08 "python3.10 -m front…" 12 hours ago Up 12 hours 0.0.0.0:3001->3001/tcp, :::3001->3001/tcp rag-playground
d##########1 nvcr.io/nvidia/aiworkflows/rag-application-multiturn-chatbot:24.08 "uvicorn RAG.src.cha…" 12 hours ago Exited (3) 12 hours ago rag-application-multiturn-chatbot
c##########7 nvcr.io/nim/nvidia/nv-embedqa-e5-v5:1.0.0 "/opt/nvidia/nvidia_…" 12 hours ago Exited (139) 12 hours ago nemo-retriever-embedding-microservice
4##########c nvcr.io/nim/meta/llama3-8b-instruct:1.0.0 "/opt/nvidia/nvidia_…" 12 hours ago Exited (1) 12 hours ago nemollm-inference-microservice
7##########d pgvector/pgvector:pg16 "docker-entrypoint.s…" 12 hours ago Up 12 hours 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp pgvector
3##########d nvcr.io/nvidia/k8s/dcgm-exporter:3.2.5-3.1.8-ubuntu22.04 "/usr/local/dcgm/dcg…" 12 hours ago Up 12 hours 0.0.0.0:9400->9400/tcp, :::9400->9400/tcp romantic_keller

Container "nim"/"nemollm-inference-microservice" might fail for two reasons
- (1) VMClass used for the DLVM for the UVM parameter is not set
  - From the ESXi host, navigate to the RAG VM directory and check the .vmx file for the below parameter
  - pciPassthru0.cfg.enable_uvm = "1"
- (2) Incorrect NVIDIA vGPU driver is installed on the ESXi host
  - [root@ESX##:~] nvidia-smi vgpu -c
    GPU 00000000:C1:00.0
    No vGPUs found on this device
    [root@ESX##:~]
  - [root@ESX##:~] nvidia-smi vgpu -s
    GPU 00000000:C1:00.0
    NVIDIA L40S-1B
    NVIDIA L40S-2B
    NVIDIA L40S-1Q
    NVIDIA L40S-2Q
    NVIDIA L40S-3Q
    NVIDIA L40S-4Q
    NVIDIA L40S-6Q
    NVIDIA L40S-8Q
    NVIDIA L40S-12Q
    NVIDIA L40S-16Q
    NVIDIA L40S-24Q
    NVIDIA L40S-48Q
    NVIDIA L40S-1A
    NVIDIA L40S-2A
    NVIDIA L40S-3A
    NVIDIA L40S-4A
    NVIDIA L40S-6A
    NVIDIA L40S-8A
    NVIDIA L40S-12A
    NVIDIA L40S-16A
    NVIDIA L40S-24A
    NVIDIA L40S-48A
- You should be seeing the output similar to below if the correct NVIDIA driver is installed
  - [root@ESX##:~] nvidia-smi vgpu -c
    GPU 00000000:C1:00.0
    NVIDIA L40S-1B
    NVIDIA L40S-2B
    NVIDIA L40S-1Q
    NVIDIA L40S-2Q
    NVIDIA L40S-3Q
    NVIDIA L40S-4Q
    NVIDIA L40S-6Q
    NVIDIA L40S-8Q
    NVIDIA L40S-12Q
    NVIDIA L40S-16Q
    NVIDIA L40S-24Q
    NVIDIA L40S-48Q
    NVIDIA L40S-1A
    NVIDIA L40S-2A
    NVIDIA L40S-3A
    NVIDIA L40S-4A
    NVIDIA L40S-6A
    NVIDIA L40S-8A
    NVIDIA L40S-12A
    NVIDIA L40S-16A
    NVIDIA L40S-24A
    NVIDIA L40S-48A
    NVIDIA L40S-4C NVIDIA L40S-6C NVIDIA L40S-8C
    NVIDIA L40S-12C NVIDIA L40S-16C NVIDIA L40S-24C NVIDIA L40S-48C

Environment

vRA 8.18.1 release (Affected version)

Cause

This issue is seen only in the vRA 8.18.1 version. The container nemollm-inference-microservice failed to start due to incorrect vGPU driver installed on the ESXi host.

Resolution

To resolve the issue, update the blueprint on vRA for the RAG Catalog, so that the containers would pick the latest images.

Login to VRA and locate the RAG Catelog
- In Assembler > Design > Edit the template
- Search for "softwareCloudInit"
- Within the "CloudInit" code search for the below line
  - .services."nemollm-inference".deploy.resources.reservations.devices[0].device_ids = ["${LLM_MS_GPU_ID:-0}"] |
- Add the below new line to the code to pick up the latest image
  - .services."nemollm-embedding".image = "nvcr.io/nim/nvidia/nv-embedqa-e5-v5:latest" |
- Click on "version" at the bottom of the page and check the box "Release this version to the catalog"

Workaround: For containers to pick up the latest image

1. With in the RAG deployed VM, updated the yaml file
2. Navigate and locate the yaml file /opt/data/ai-chatbot-docker-workflow_v24.08/docker-compose-nim-ms.yaml
3. Run the below commands
  - cp /opt/data/ai-chatbot-docker-workflow_v24.08/docker-compose-nim-ms.yaml /var/tmp/docker-compose-nim-ms.yaml.orig
  - vi /opt/data/ai-chatbot-docker-workflow_v24.08/docker-compose-nim-ms.yaml
  - Change the ending '1.0.0 to 'latest' for the below line
  - From
    - 'image: nvcr.io/nim/nvidia/nv-embedqa-e5-v5:1.0.0'
  - To
    - 'image: nvcr.io/nim/nvidia/nv-embedqa-e5-v5:latest'
4. Run the below command to stop and start the containers
  - To remove the containers
  - docker compose -f /opt/data/ai-chatbot-docker-workflow_v24.08/rag-app-multiturn-chatbot/docker-compose.yaml --profile local-nim --profile pgvector down
  - To start the containers
  - /opt/dlvm/dl_app.sh
5. Run the docker ps -a command to validate if all the containers are started with the latest image.
  - (base) vmware@airagworkstation6:/opt/data/ai-chatbot-docker-workflow_v24.08$ docker ps -a
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    0##########0 nvcr.io/nvidia/aiworkflows/rag-playground:24.08 "python3.10 -m front…" About a minute ago Up About a minute 0.0.0.0:3001->3001/tcp, :::3001->3001/tcp rag-playground
    c##########a nvcr.io/nvidia/aiworkflows/rag-application-multiturn-chatbot:24.08 "uvicorn RAG.src.cha…" About a minute ago Up About a minute 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp rag-application-multiturn-chatbot
    5##########5 nvcr.io/nim/nvidia/nv-embedqa-e5-v5:latest "/opt/nim/start_serv…" About a minute ago Up About a minute (healthy) 0.0.0.0:9080->8000/tcp, :::9080->8000/tcp nemo-retriever-embedding-microservice
    4##########b nvcr.io/nim/meta/llama3-8b-instruct:1.0.0 "/opt/nvidia/nvidia_…" About a minute ago Up About a minute (healthy) 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp nemollm-inference-microservice
    9##########8 pgvector/pgvector:pg16 "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp pgvector
    e##########3 nvcr.io/nvidia/k8s/dcgm-exporter:3.2.5-3.1.8-ubuntu22.04 "/usr/local/dcgm/dcg…" About an hour ago Up About an hour 0.0.0.0:9400->9400/tcp, :::9400->9400/tcp mystifying_montalcini

Update VM Class with UVM parameter:

If the UVM parameter is missing, navigate to the Workload Management > Services > VM Classes > Select the appropriate VM Class and click on > Edit VM Class

Click on "Advanced Parameters" and add the new attribute and value as pciPassthru0.cfg.enable_uvm = "1" as shown below.

When the new RAG VM is redeployed using the above VM Class, this advanced parameter will be added to the VM.

Additional Information

VM Class configuration for vGPU-Based AI Worklaods - Configure vGPU-Based VM Classes for AI Workloads for VMware Private AI Foundation with NVIDIA
NVIDIA vGPU (C-Series) drivers are one of the deployment approaches supported by NVIDIA AI Enterprise in virtualized environments. NVIDIA vGPU (C-Series) extends the capabilities of NVIDIA AI Enterprise by enabling AI and machine learning tasks to run in virtual machines that share GPU resources. For more details refer - https://docs.nvidia.com/ai-enterprise/release-6/latest/appendix/vgpu.html
NVIDIA driver details can be found here https://catalog.ngc.nvidia.com/orgs/nvidia/teams/vgpu/resources/vgpu-host-driver-6/files