Node Pool stuck in pending state following CNF reconfiguration due to Calico IPAM failure

search cancel

Node Pool stuck in pending state following CNF reconfiguration due to Calico IPAM failure

book

Article ID: 429587

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

After CNF reconfiguration, the Node Pool remains stuck in a pending state.
caas_spoke pod on the TCA-CP fails to start.

Environment

3.2

Cause

Calico IP Address Management (IPAM) failure or state lock on the TCACP appliance prevents the allocation of pod IP addresses.This stalls the caas_spoke pod initialization and halts Node Pool state reconciliation. Attempts to clear the condition by restarting Calico controllers, deleting the affected pod, or restarting containerd and kubelet services independently do not resolve the IPAM lock.

Resolution

Perform a full reboot of the affected TCA-CP appliance to thoroughly reinitialize the network stack, Kubernetes core services, and clear the Calico IPAM fault.
Monitor the appliance startup and verify that core services and the caas_spoke pod return to a normal, running state.
Navigate to the TCA user interface and locate the affected Node Pool.
Perform a dummy edit on the Node Pool (e.g., modify a description or tag, then revert the change without altering operational parameters) and save the configuration. This forces a state synchronization within the TCA orchestrator.
Monitor the Node Pool as it recovers from the processing state and successfully transitions to the provisioned state.

Additional Information

To determine the origin of the initial Calico IPAM lock or any preceding Kafka/DNS connection anomalies, ensure to collect TCA-CP support bundles, along with containerd and kubelet journal logs, are collected prior to executing the appliance reboot.

Feedback

thumb_up Yes

thumb_down No