Missing end of stream in streaming inference responses (VMware Tanzu AI Services)
search cancel

Missing end of stream in streaming inference responses (VMware Tanzu AI Services)

book

Article ID: 441802

calendar_today

Updated On:

Products

VMware Tanzu Platform

Issue/Introduction

When performing model inference using the Tanzu AI Services 10.4 tile with the "stream": true option enabled, the response does not return the end-of-stream signal or the finish_reason by default. This behavior prevents clients from identifying the completion of the AI-generated response, leading to incomplete or failed processing in downstream applications.

Cause

A regression introduced in Tanzu AI Services 10.4 affects how default streaming options are handled, resulting in the omission of completion markers from the inference output.

Resolution

Fixed in Tanzu AI Services release 10.4.1 and higher. See Download Broadcom products and software for steps to download this release.

Additional Information

Workaround

If the upgrade to 10.4.1 is not immediately possible, modify the inference request payload to include the following option:

json

{
  "stream": true,
  "stream_options": {
    "include_usage": false
  }
}

 

Including this parameter forces the response to include the end-of-stream data and finish_reason markers.

Environment Health Check

Ensure the Postgres database associated with the AI tile has sufficient storage space. Exhaustion of disk space on the database (e.g., PSQLException: No space left on device) can cause the ai-server and vllm-controller virtual machines to fail. Increasing the Postgres DB disk size to 20GB and disabling debug tracing helps maintain service stability.