VCF Automation upgrade fails with vco-app pods stuck in CrashLoopBackoff
search cancel

VCF Automation upgrade fails with vco-app pods stuck in CrashLoopBackoff

book

Article ID: 425145

calendar_today

Updated On:

Products

VCF Automation

Issue/Introduction

 When upgrading VMware Cloud Foundation Automation (VCFA) from version 9.0.0 to 9.0.1, the process may fail with a timeout after two hours. This issue occurs because the vco-app pods in the Kubernetes cluster are stuck in a CrashLoopBackoff status. Specifically, the install-rpms init container fails to complete. You can verify this by checking the logs of the failing container, which will show an error indicating that there is no space left on the device during RPM installation.

Environment

  • Product: VMware Cloud Foundation Appliance (VCFA)
  • Versions: 9.0.0 upgrading to 9.0.1
  • Component: VMware Aria Operations Orchestrator (vco-app)

Cause

The persistent volume (PVC) associated with the vco-app pods runs out of available space. This prevents the install-rpms init container from installing the necessary updates required for the 9.0.1 version.

Resolution

To resolve this issue, you must clean the persistent volumes for each of the three vco-app pods.

  • Identify the Pod UID: Run the following command to find the unique identifier for the pod:
kubectl get pods vco-app-0 -n prelude -oyaml | yq '.metadata.uid'
  • Locate the Node: Identify which node the pod is running on and find its External IP:
kubectl get pods -n prelude -owide | grep vco-app-0
kubectl get nodes -owide
  • Access the Node: SSH into the identified node:
ssh vmware-system-user@<node_ip_address> sudo su -
  • Find the Mount Point: Locate the specific volume path using the UID found in Step 1:
mount | grep <pod_uid> | grep pvc
  • Clear the Directory: Navigate to the mount path and clear the usr/lib/vco directory: 
cd /var/lib/kubelet/pods/<pod_uid>/volumes/kubernetes.io~csi/<pvc_id>/mount cd usr/lib/vco rm -Rf *
  • Repeat these steps for vco-app-1 and vco-app-2.
  • Retry the upgrade from the Fleet LCM interface once all vco-app pods are in a Running state.