When performing model inference using the Tanzu AI Services 10.4 tile with the "stream": true option enabled, the response does not return the end-of-stream signal or the finish_reason by default. This behavior prevents clients from identifying the completion of the AI-generated response, leading to incomplete or failed processing in downstream applications.
A regression introduced in Tanzu AI Services 10.4 affects how default streaming options are handled, resulting in the omission of completion markers from the inference output.
Fixed in Tanzu AI Services release 10.4.1 and higher. See Download Broadcom products and software for steps to download this release.
Workaround
If the upgrade to 10.4.1 is not immediately possible, modify the inference request payload to include the following option:
json
{
"stream": true,
"stream_options": {
"include_usage": false
}
}
Including this parameter forces the response to include the end-of-stream data and finish_reason markers.
Environment Health Check
Ensure the Postgres database associated with the AI tile has sufficient storage space. Exhaustion of disk space on the database (e.g., PSQLException: No space left on device) can cause the ai-server and vllm-controller virtual machines to fail. Increasing the Postgres DB disk size to 20GB and disabling debug tracing helps maintain service stability.