If SSPI Ingress certificate update is done in quick succession, pods running in control nodes goes into ImagePullBackOff state and not able to come up.
SSP v5.1.0 and v5.1.1
If SSPI Ingress certificate update is done in quick succession, docker CA property in k8s' control plane custom resource could get stuck in it's earlier value due to a race condition with CAPI controller. As a result, SSP control nodes could no longer pull image from SSPI as SSPI is presenting a certificate not known to the control nodes. Symptom can be observed from pod status:
kube-system antrea-agent-pn5v6 0/2 Init:ImagePullBackOff 0 5m3s
kube-system antrea-agent-vw8ql 0/2 Init:ImagePullBackOff 0 2m41s
kube-system kube-vip-9o5e5dsm-controller-7bb5k 0/1 ImagePullBackOff 0 2m39s
kube-system kube-vip-9o5e5dsm-controller-f6pgf 0/1 ImagePullBackOff 0 5m3s
kube-system vsphere-cpi-8868l 0/1 ImagePullBackOff 0 5m3s
kube-system vsphere-cpi-hj246 0/1 ImagePullBackOff 0 2m41s
vmware-system-csi vsphere-csi-node-g9dtj 0/3 ImagePullBackOff 0 2m41s
Pod events shows that it can not trust the certificate presented by docker registry
vmware-system-csi 5m24s Warning Failed pod/vsphere-csi-node-xgbr5 Failed to pull image "sspifqdn.com.internal/registry/install/sig-storage/csi-node-driver-registrar:v2.10.1": unable to pull image or OCI artifact: pull image err: initializing source docker://sspifqdn.com.internal/registry/install/sig-storage/csi-node-driver-registrar:v2.10.1: pinging container registry sspifqdn.com.internal: Get "https://sspifqdn.com.internal/v2/": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "sspifqdn.com.internal"); artifact err: get manifest: build image source: pinging container registry sspifqdn.com.internal: Get "https://sspifqdn.com.internal/v2/": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "sspifqdn.com.internal")
vmware-system-csi 5m24s Warning Failed pod/vsphere-csi-node-xgbr5 Error: ErrImagePull
vmware-system-csi 5m24s Normal Pulling pod/vsphere-csi-node-xgbr5 Pulling image "sspifqdn.com.internal/registry/install/csi-vsphere/driver:v3.3.1"
From SSPI, /var/log/secop/secop.log, we see that updating kubeadmcontrolplane object failed due to optimistic concurrency error:
2026-02-12T03:05:14.594Z INFO secopapi/secopapi.go:2040 Request received to UpdateCertificate
2026-02-12T03:05:16.370Z INFO certificateservice/service.go:252 Updating K8S cluster's docker CA, /config/clusterctl/1/9o5e5dsm.kubeconfig
2026-02-12T03:05:16.533Z ERROR certificateservice/service.go:291 Failed to update kubeadmcontrolplanes {"name": "9o5e5dsm-controller", "namespace": "9o5e5dsm", "error": "Operation cannot be fulfilled on kubeadmcontrolplanes.controlplane.cluster.x-k8s.io \"9o5e5dsm-controller\": the object has been modified; please apply your changes to the latest version and try again"}
When the system falls into this state, users should perform SSPI certificate replacement workflow again. This can be done by generating a new CSR, have it signed by a trusted CA and upload the new certificate again. This would force the certificate for docker CA to be refreshed with the latest CA on all control nodes.
Please refer to : manage-certificates