After replacing SSP-Installer VM's ingress certificate, pods are not able to pull the image from local registry in the new nodes

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

Replace the certificate in the SSP-Installer GUI -> "Certificates" page.

After scaling out or restarting nodes, pods that are trying to spin up on the new nodes have image pull backoff errors.

Environment

vDefend SSP Installer 5.0

Cause

Custom Resource "kubeadmconfigtemplates" used by the vSphere provider to create nodes takes the certificate as part of the configuration. However, when we replace the certificate, we do not update this template. As a result, new worker nodes created by this template will use the old certificate and pods in this new nodes are not able to connect to the local registry

We can find the current SSPI certificate by:

1. SSH to SSPI
2. Run "cat /usr/local/share/ca-certificates/domain.crt"

We can find the certificate used for worker nodes by:

1. SSH to SSPI
2. Get the kubeadmconfigtemplates used by the current SSP instance: "kubectl -n <ssp-instance-name> get kubeadmconfigtemplates". There will be only one template here.
3. Check the details of this template: "kubectl -n <ssp-instance-name> get kubeadmconfigtemplates <template-name> -o yaml"
4. The certificate is in the "preKubeadmCommands" section: - echo "<base64-encoded-certificate>" |base64 --decode >/usr/local/share/ca-certificates/harbor.crt
5. Decode the "<base64-encoded-certificate>"

We can find the certificate used for control plane nodes by:

1. SSH to SSPI
2. Get the "kubeadmcontrolplane" used by the current SSP instance: "kubectl -n <ssp-instance-name> get kubeadmcontrolplane". There will be only one kubeadmcontrolplane here.
3. Check the details of this template: "kubectl -n <ssp-instance-name> get kubeadmcontrolplane <control-plane-name> -o yaml"
4. The certificate is in the "preKubeadmCommands" section: - echo "<base64-encoded-certificate>" |base64 --decode >/usr/local/share/ca-certificates/harbor.crt
5. Decode the "<base64-encoded-certificate>"

The certificates from domain.crt and kubeadmconfigtemplates/kubeadmcontrolplane will be different in this case, because only domain.crt is updated.

Resolution

To resolve the issue, we need to manually update the custom resources "kubeadmconfigtemplates" and "kubeadmcontrolplane"

For kubeadmconfigtemplates

1. SSH to SSPI using root credentials.

2. Get the kubeadmconfigtemplates used by the current SSP instance. There will be only one template here.

kubectl -n <ssp-instance-name> get kubeadmconfigtemplates

3. Check the details of this template

kubectl -n <ssp-instance-name> get kubeadmconfigtemplates <template-name> -o yaml

4. The certificate is in the "preKubeadmCommands" section: - echo "<base64-encoded-certificate>" |base64 --decode >/usr/local/share/ca-certificates/harbor.crt.

5. Encode your current certificate with base64. You can find it from "/usr/local/share/ca-certificates/domain.crt"

base64 -w0 /usr/local/share/ca-certificates/domain.crt > new_certificate_base64.txt

6. Replace the certificate we found in step 4. Please replace with the new base64 encoded certificate

kubectl -n <ssp-instance-name> edit kubeadmconfigtemplates <template-name>

7. Delete the worker node where the pods cannot pull the images

k delete node <node-name>

8. New worker nodes will be created automatically after the deletion.

For kubeadmcontrolplane

1. SSH to SSPI

2. Get the kubeadmcontrolplane used by the current SSP instance. There will be only one control plane here.

kubectl -n <ssp-instance-name> get kubeadmcontrolplane

3. Check the details of this control plane.

kubectl -n <ssp-instance-name> get kubeadmcontrolplane <control-plane-name> -o yaml

4. The certificate is in the "preKubeadmCommands" section: - echo "<base64-encoded-certificate>" |base64 --decode >/usr/local/share/ca-certificates/harbor.crt.

5. Encode your current certificate with base64. You can find it from "/usr/local/share/ca-certificates/domain.crt"

base64 -w0 /usr/local/share/ca-certificates/domain.crt > new_certificate_base64.txt

6. Replace the certificate we found in step 4. Please replace with the new base64 encoded certificate.

kubectl -n <ssp-instance-name> edit kubeadmcontrolplane <control-plane-name>

7. After the replacement, all control plane nodes will be restarted automatically.

Additional Information

If certificate is replaced on SSPI 5.0 before SSP deployment, you may see the below error during SSP deployment:

Failed 3/17 tasks: [Initialize clusterctl] Command '/usr/local/bin/clusterctl init --core cluster-api --bootstrap kubeadm --control-plane kubeadm --ipam incluster --infrastructure vsphere --config /config/clusterctl/1/clusterctl-init.yaml --wait-provider-timeout 900 --wait-providers --v 5' execution is failed. Output: Using configuration

Please restart the below service in SSPI to remediate this.

systemctl restart crio

Note: Please note this step is only required on 5.0 SSPI.