The following are possible combination of symptoms that would require this KB:
1. 'Open Terminal' feature does not function correctly.
When a user opens a terminal on a Cluster or CNF, the terminal opens and times-out after a while without functioning correctly.
2. Pods in Pending state on TCA appliances
In relation to the symptom mentioned above, any new pods spawned within the TCA appliance end up in a Pending state
3. Kubelet scheduler logs have errors in scheduling logs and report a lot of unauthorized log messages.
Run the following command to see the kubelet scheduler logs:
#kubectl logs kube-scheduler-photon -n kube-system
4. "/logs" partition within the TCA appliance is full or is filling up fast
Typically the /logs/retained-logs/kubelet.service folder ends up consuming almost all the space.
#df -h
5. Kubelet apiserver logs clearly state that certificate has expired.
Run the following command to see the kubelet apiserver logs:
#kubectl logs kube-apiserver-photon -n kube-system
The following logs are being spammed constantly:
"Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time ...
6. A lot of platform-mgr-tmp-cleanup-cronjob pods are stuck or in error state.
#kubectl get pods -A | grep platform-mgr-tmp-cleanup-cronjob#kubectl get jobs -A | grep platform-mgr-tmp-cleanup-cronjob
Note : To check kubelet certificate, one can execute #kubeadm certs check-expiration
Note : The symptoms could also be observed if the certificate rotation itself failed and did not happen.
Workaround :
If the certificates are renewed correctly, then follow the resolution below:
#su
#kubeadm cert check-expiration
#openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -enddate
#tcaNamespace=$(kubectl get namespace tca-mgr >/dev/null 2>&1 && echo "tca-mgr" || echo "tca-cp-cn")kubectl get job -n $tcaNamespace |grep platform-mgr-tmp-cleanup-cronjob | awk '{print $1}' | xargs -I {} kubectl -n $tcaNamespace delete job {}
#mkdir -p /home/admin/manifests-bk/
mv /etc/kubernetes/manifests/* /home/admin/manifests-bk/
# wait for max of 30s till kubelet removes the control plane pod containers# check for kubectl get pods -A command to fail with connection refused
mv /home/admin/manifests-bk/* /etc/kubernetes/manifests/
# wait for control plane pod containers to come up max wait timeout 20seconds you can check the same with the below command if up it should give output "ok"
kubectl get --raw=/readyz --kubeconfig=/home/admin/.kube/config
KUBECONFIG_B64=$(base64 -w 0 /etc/kubernetes/admin.conf)kubectl apply --kubeconfig /etc/kubernetes/admin.conf -f - <<EOFapiVersion: v1kind: Secretmetadata: name: kubeconfig-secret namespace: ${tcaNamespace}type: Opaquedata: kubeconfig: ${KUBECONFIG_B64}EOF
#tca-cp-cn 1234####-####-####-####-a4c1eb2c#### 0/1 Pending 0 22m