After replacing SSPI certificates, the sspi-cert-injector pods are stuck. New services in the SSP workload cluster could not pull images.
To check the sspi-cert-injector status,
Login to SSPI cli using sysadmin credentials and we can run the commands from SSPI cli
k get pods -A | grep sspi-cert-injector
If sspi-cert-injector pod is in the init:0/1 for more than 5 minutes from the above output , it confirms pod status stuck at init state.
SSPI 5.1.1
A race condition occurs in the sspi-cert-injector pod. After we replace the certificate and restart CRI-O:
- The init container is waiting for CRI-O to become "active"
- CRI-O is waiting for the init container to terminate
Neither can proceed in this case, and the certificate update workflow is not completed. Therefore, the node that the sspi-cert-injector pod sits on could not get the latest certificate for its local registry, and new pods scheduled on this node could not pull images from the registry.
Restarting sspi-cert-injector pod will clean up the stuck workflow and trigger it again and this will resolve the issue
Steps to follow to restart sspi-cert-injector pod:
1. Login to the SSPI cli using sysadmin credentials
k -n nsxi-platform get pods -A | grep sspi-cert-injector >> get the sspi-cert-injector pod name which stuck at init state
2. Restart the pod using below command
k -n nsxi-platform delete pod <<pod name received on step1)