kubectl get nodes
kubectl get pods -A
kubectl get pods -A | grep -v Run
kubectl describe -n <ImagePullBackOff pod namespace> <ImagePullBackOff pod name>
waiting:
message: 'rpc error: code = NotFound desc = failed to pull and unpack image
"localhost:5000/vmware.io/antrea/antrea:v#.#.#_vmware.#": failed to resolve reference "localhost:5000/vmware.io/antrea/antrea:v#.#.#_vmware.#": localhost:5000/vmware.io/antrea/antrea:v#.#.#_vmware.#: not found'
reason: ErrImagePull
---- applying 2 changes [0/6 done] ----
noop apiservice/v1alpha1.data.packaging.carvel.dev (apiregistration.k8s.io/v1) cluster
create namespace/tkg-system (v1) cluster
^ Retryable error: Creating resource namespace/tkg-system (v1) cluster: API server says: Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": failed to call webhook: Post "https://gatekeeper-webhook-service.gatekeeper.svc:443/v1/admitlabel?timeout=3s": dial tcp ###.###.###.###:443: i/o timeout (reason: InternalError)
The above-mentioned webhook can be edited to fail-open as below:
kubectl edit validatingwebhookconfiguration gatekeeper-validating-webhook-configuration
"failurePolicy: Fail"
and change it to "failurePolicy: Ignore"
If the above steps do not work please proceed with temporarily removing the webhook from the workload cluster to complete the upgrade process. Below are the steps that need to be followed in order to complete the upgrade:
kubectl get pods -A |grep gatekeeper
validatingwebhookconfiguration
and mutatingwebhookconfiguration objects to a file:
kubectl get validatingwebhookconfiguration -n <gatekeeper namespace> gatekeeper-validating-webhook-configuration -o yaml > gatekeeper-validating-webhook-backup.yaml
kubectl get mutatingwebhookconfiguration -n <gatekeeper namespace> gatekeeper-mutating-webhook-configuration -o yaml > gatekeeper-mutating-webhook-backup.yaml
During the rolling upgrade process, nodes are expected to be replaced and recreated. The backup yamls will be lost if the current machine gets replaced and they are not copied off of the node.
kubectl get deploy -n <gatekeeper namespace>
Note down the current deployment replica values in order to scale them back up properly after the upgrade completes.
kubectl scale deploy -n <gatekeeper namespace> gatekeeper-controller-manager --replicas=0
kubectl scale deploy -n <gatekeeper namespace> gatekeeper-operator-manager --replicas=0
Delete the validatingwebhookconfiguration and mutatingwebhookconfiguration:
kubectl delete validatingwebhookconfiguration -n <gatekeeper namespace> gatekeeper-validating-webhook-configuration
kubectl delete mutatingwebhookconfiguration -n <gatekeeper namespace> gatekeeper-mutating-webhook-configuration
watch kubectl get nodes
kubectl apply -f gatekeeper-validating-webhook-backup.yaml
kubectl apply -f gatekeeper-mutating-webhook-backup.yaml
kubectl get validatingwebhookconfiguration -n <gatekeeper namespace>
kubectl get mutatingwebhookconfiguration -n <gatekeeper namespace>
kubectl get deploy -n <gatekeeper namespace>
kubectl scale deploy -n <gatekeeper namespace> gatekeeper-controller-manager --replicas=X
kubectl scale deploy -n <gatekeeper namespace> gatekeeper-operator-manager --replicas=X
kubectl get pods -n <gatekeeper namespace>