Deployment Fails with "Timed out sending 'get_state'" Error When Configuring Ollama in Tanzu Platform Evaluation Appliance
search cancel

Deployment Fails with "Timed out sending 'get_state'" Error When Configuring Ollama in Tanzu Platform Evaluation Appliance

book

Article ID: 441583

calendar_today

Updated On:

Products

VMware Tanzu Platform Core

Issue/Introduction

Deployment fails when configuring the Tanzu Platform Evaluation Appliance to use a self-hosted Ollama model in the AI Services tile. The deployment process hangs and eventually returns an error indicating the BOSH Director cannot communicate with the controller instance.

The following error appears in the deployment logs:

Error: controller/####: Timed out sending 'get_state' to instance: 'controller/####', agent-id: '####' after 45 seconds
 

Additionally, overall appliance stability is poor. Internal services randomly stop working or terminate due to resource constraints.

Environment

  • Tanzu Platform Evaluation Appliance 10.4

  • AI Services Tile

  • Self-hosted Ollama Model integration

  • vSphere / vCenter infrastructure

Cause

The issue stems from severe resource exhaustion. The controller virtual machine acts as the host for the Ollama engine and the loaded Large Language Model (LLM). Loading large models purely on CPU compute (without dedicated GPUs) requires significant RAM and CPU.

When resources are insufficient, the Linux Out-Of-Memory (OOM) killer terminates processes, or the system entirely freezes. Consequently, the BOSH Agent running on the controller instance becomes unresponsive, triggering the 45-second get_state timeout.

Furthermore, the Tanzu Platform Evaluation Appliance is a highly compressed, all-in-one environment designed for lightweight feature testing. Failing to meet the strict baseline hardware requirements for the base OVA causes cascading service failures across the entire appliance.

Resolution

Address the resource bottleneck by scaling the underlying infrastructure, adjusting internal allocations, or reducing the model size.

Phase 1: Increase Base Appliance Resources (vSphere) Ensure the base virtual machine meets or exceeds the strict minimum requirements for AI Services (24 vCPUs, 72 GB RAM, 500 GB Disk).

 

Phase 2: Adjust Internal Tile Resources (Ops Manager) If base resources are sufficient but the specific service still fails, allocate more resources to the controller job in the Resource Config section and select a larger instance profile from the VM Type drop-down menu (e.g., change from cpu-medium to cpu-large or cpu-xlarge)..

 

Phase 3: Recover the Unresponsive Instance (BOSH CLI) If the deployment is currently stuck in a failed state, use BOSH Cloud Check to repair the unresponsive VM.