vllm-worker fails to start due to error "RuntimeError: Failed to infer device type"

search cancel

vllm-worker fails to start due to error "RuntimeError: Failed to infer device type"

book

Article ID: 416614

calendar_today

Updated On:

Products

VMware Tanzu Application Service

Issue/Introduction

During GenAI deployment, vllm-worker job is reported failure at post-start phase

Task 12345 | 03:11:00 | L starting jobs: ###### (0) (canary)
Task 12345 | 03:11:05 | L executing post-start: ###### (0) (canary) (00:33:22)
                       L Error: Action Failed get_task: Task "#### result: 1 of 2 post-start scripts failed. Failed Jobs: vllm-worker. Successful Jobs: bosh-dns.
Task 12345 | 03:15:07 | Error: Action Failed get_task: Task "#### result: 1 of 2 post-start scripts failed. Failed Jobs: vllm-worker. Successful Jobs: bosh-dns.

According to vllm-worker.stderr.log, the process fails to start due to

RuntimeError: Failed to infer device type, please set the environment variable `VLLM_LOGGING_LEVEL=DEBUG` to turn on verbose logging to help debug the issue.

In the meanwhile, post-start.stdout.log of vllm-worker raise error "No platform detected".

INFO 10-25 03:12:35 [__init__.py:220] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 10-25 03:12:35 [_custom_ops.py:20] Failed to import from vllm._C with ImportError('libcuda.so.1: cannot open shared object file: No such file or directory')

Environment

GenAI on Tanzu Platform

Cause

CPU-based inference is not supported in the Tanzu GenAI tile. At "Model Config", this is documented at Comparative Analysis: vLLM versus Ollama for GenAI Inference. If vllm model is configured with cpu processing technology, vllm-worker will not be able to start and raise the errors.

Resolution

To resolve the problem, please configure non-CPU-based inferences with vllm models.

Feedback

thumb_up Yes

thumb_down No