Unable to open the cluster terminal of TCA CP from TCA GUI

search cancel

Unable to open the cluster terminal of TCA CP from TCA GUI

book

Article ID: 403424

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

While trying to "Open Terminal cluster" for few of clusters under Virtual Infrastructure objects we see below issue. Issue seen for both vSphere and Kubernetes clusters

When we launch cluster terminal of TCA CP in the backed we can see PODs getting created on the respective TCA-CP, but it get stuck in pending state.

kubectl get pods -n tca-cp-cn | grep -i pending

tca-cp-cn 1234####-####-####-####-c7e5ba8f#### 0/1 Pending 0 17m
tca-cp-cn 5678####-####-####-####-6dc4015d#### 0/1 Pending 0 9h
tca-cp-cn 1234####-####-####-####-662bc696#### 0/1 Pending 0 6h
tca-cp-cn 12ab####-####-####-####-dc53d2a5#### 0/1 Pending 0 8h
tca-cp-cn 34cd####-####-####-####-d7b2a0f4#### 0/1 Pending 0 11m

Environment

TCA 3.x

Cause

The reason PODs are in pending state is due to Kubernetes scheduler was unable to communicate with other control plane components due to the use of expired certificates, even though cert rotation had occurred long back.

The war-machine-agent service with in tca appliance is responsible for automatically renewing k8s control plane component certs when they are within 60 days of expiry.

"Due to a race condition between two threads during the certificate rotation process, the control plane components do not restart, and they continue to use the old expired certificates"

Resolution

Renew the certs using below steps:

switch to root user
export KUBECONFIG=/home/admin/.kube/config

1. We cleaned up all pending pods, using the following command.

tcaNamespace=$(kubectl get namespace tca-mgr >/dev/null 2>&1 && echo "tca-mgr" || echo "tca-cp-cn")
kubectl get job -n $tcaNamespace |grep platform-mgr-tmp-cleanup-cronjob | awk '{print $1}' | xargs -I {} kubectl -n $tcaNamespace delete job {}

2. Restart controlplane components to take the rotated certs into effect.

mkdir -p /home/admin/manifests-bk/
mv /etc/kubernetes/manifests/* /home/admin/manifests-bk/
# wait for max of 30s till kubelet removes the control plane pod containers
# check for kubectl get pods -A command to fail with connection refused
mv /home/admin/manifests-bk/* /etc/kubernetes/manifests/
# wait for control plane pod containers to come up max wait timeout 20seconds you can check the same with the below command if up it should give output "ok"
kubectl get --raw=/readyz --kubeconfig=/home/admin/.kube/config

3. Update the kubeconfig secret with the rotated kubeconfig file having new certs.

KUBECONFIG_B64=$(base64 -w 0 /etc/kubernetes/admin.conf)
kubectl apply --kubeconfig /etc/kubernetes/admin.conf -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: kubeconfig-secret
  namespace: ${tcaNamespace}
type: Opaque
data:
  kubeconfig: ${KUBECONFIG_B64}
EOF

Feedback

thumb_up Yes

thumb_down No