TCA pre-upgrade task stuck in "Running" state with endpoint health "Unknown"
search cancel

TCA pre-upgrade task stuck in "Running" state with endpoint health "Unknown"

book

Article ID: 421557

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

  • The pre-upgrade check or task remains stuck in a "Running" state indefinitely.
  • In the TCA UI, the Endpoint Status for clusters (Management or Workload) displays as "Connected".
  • The Health Status for these endpoints displays as "Unknown".

Environment

3.2

Cause

  • The issue is caused by a transient state mismatch or service hang within the TCA Control Plane services responsible for monitoring endpoint health.
  • This prevents the pre-upgrade validation workflow from receiving the necessary health signals to proceed.

Resolution

If the Diagnosis plugin fails to run or stalls, verify if the tca-diagnosis-caas-plugin pod has exhausted its file descriptors.

  1. SSH to the TCA Control Plane (TCA-CP) node where the Management Cluster is deployed.
  2. Identify the Diagnosis Plugin pod name:
    kubectl get pods -n tca-cp-cn | grep tca-diagnosis-caas-plugin
  3. Replace <pod_name> with the name retrieved in the previous step.
    kubectl logs -f -n tca-cp-cn <pod_name>

    Scan the output for the error string: too many open files.

  4. If the error is present, delete the pod. The ReplicaSet will automatically respawn a new instance.
    kubectl delete pod -n tca-cp-cn <pod_name>

Wait for the new pod to reach Running status, then retry the Diagnosis request in the TCA UI.