Security Services Platform Installer (SSPi) upgrade is failing with error "Rollout of kubeadm control plane did not complete within 3600 secs".
search cancel

Security Services Platform Installer (SSPi) upgrade is failing with error "Rollout of kubeadm control plane did not complete within 3600 secs".

book

Article ID: 417980

calendar_today

Updated On:

Products

VMware vDefend Firewall with Advanced Threat Prevention VMware vDefend Firewall

Issue/Introduction

During SSPi upgrade, new control plane node creation stalls in Provisioning state.

Several Kubernetes system pods stuck at CrashLoopBackOff or ImagePullBackOff state shown below:

antrea-agent
vsphere-cpi
vsphere-csi-node
kube-proxy

 

Pod logs show CPI errors:

Failed to create govmomi client.
Post "https://<vcenter-fqdn>:443/sdk": host "<vcenter-fqdn>" thumbprint does not match "<old-thumbprint>"
Cannot connect to vCenter
Error getting instance metadata for node addresses: node not found

 

Antrea agent fails to connect to Kubernetes API (10.96.0.1:443: i/o timeout).

Control plane bootstrap fails due to CPI unavailability and missing node registration.

Environment

SSPi 5.0/5.1

Cause

When a vCenter certificate changes and SSPi is reconnected, the new thumbprint is updated in vCenter configuration within SSPi.

However, the /config/clusterctl/1/clusterctl-init.yaml file and cpi-manifests configmap are not updated during reconnect.

During upgrade, this outdated file is reused for CAPI/CPI manifest generation, resulting in invalid vCenter authentication (old thumbprint used).

The cloud-config ConfigMap in the kube-system namespace from the workload cluster inherits this stale thumbprint, results in breaking CPI initialization.

Resolution

To unblock the SSPi upgrade,

Go to Instance Management -> vCenter Parameters -> Edit Connection to reconnect to vCenter after the upgrade error, then continue the upgrade from the failure point.

To prevent the same error from happening again during the upgrade from SSPi 5.0/5.1 to a higher version, please follow these steps:

 

1. Get latest thumbprint.

a) Get the vCenter FQDN/IP from Security Service Platform Installer -> Instance Management -> vCenter Parameters.

b) SSH to Security Service Platform Installer (SSPi),

openssl s_client -connect <vCenter-FQDN/IP>:443 > vc.crt
openssl x509 -fingerprint -in vc.crt -noout

 

2. Fix clusterctl-init.yaml.

a) SSH to Security Service Platform Installer (SSPi),

cat /config/clusterctl/1/clusterctl-init.yaml

b) If thumbprint is different from what we have now, replace it with latest thumbprint,

vim /config/clusterctl/1/clusterctl-init.yaml

 

3. Fix cpi-manifests configmap.

a) SSH to Security Service Platform Installer (SSPi),

kubectl -n <ssp-instance-name> get cm cpi-manifests -o yaml

b) If thumbprint is different from what we have now, replace it with latest thumbprint,

kubectl -n <ssp-instance-name> edit cm cpi-manifests

 

Note 1: <ssp-instance-name> can be found on SSPi UI -> Instance Management -> Security Service Platform Instance -> Instance Name

Note 2: If you are using SSPi 5.0 and plan to upgrade to SSPi 5.1, apply steps 2 and 3 only if you have not started the upgrade process.