Automation 9.0 installation fails at Stage 7 with "no route to host" errors in VCF 9
search cancel

Automation 9.0 installation fails at Stage 7 with "no route to host" errors in VCF 9

book

Article ID: 428361

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

When you perform a greenfield installation of Automation 9.0 within a VMware Cloud Foundation (VCF) 9 stack, the deployment fails at Stage 7 (Deploy VCF Automation).

The lifecycle manager reports "no route to host" errors.

Symptoms observed on the Automation machine during this failure include:

  • Antrea CNI agents entering a crash-loop state, typically shutting down within 2 minutes with "resource temporarily unavailable" errors.

  • Application pods such as vcfapostgres-0 and rabbitmq-ha-0 remaining in "Unknown" or "Error" states without assigned IP addresses.

  • Significant latency in kube-apiserver and etcd logs, with disk sync durations exceeding 1000ms.

Environment

VCF Automation 9.0

Cause

The issue is caused by insufficient storage performance (IOPS) and high latency on the underlying datastore. This results in etcd timeouts and control plane instability, preventing the Antrea CNI from initializing correctly. Without a functional CNI, pods cannot obtain network connectivity, leading to the "no route to host" error.

Resolution

To resolve this issue, you must improve the storage performance for the management components.

Example steps:

  1. Identify the ESXi hosts within your cluster that have the lowest resource utilization.

  2. Migrate the Fleet Ops VM and the VCFA VM to the least utilized ESXi host to stabilize etcd commit latency.

  3. Ensure the underlying storage meets the minimum requirement of 250 IOPS (500 IOPS recommended) for VCF Automation clusters to prevent control plane timeouts.

  4. Retry the deployment once storage latency is consistently below the 100ms threshold.