SSP-I Upgrade Fails at “Security Services Platform - Workload Cluster Upgrade” Step
search cancel

SSP-I Upgrade Fails at “Security Services Platform - Workload Cluster Upgrade” Step

book

Article ID: 417528

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

During the SSP-I upgrade, the process fails at the step: “Security Services Platform - Workload Cluster Upgrade”

Fails with the following error message:

[Workload Cluster Upgrade] Error occurred while executing the upgrade step. 
An unexpected exception occurred: TimeoutError: Rollout of kubeadmControlPlane <kubeadmcontrolplane> did not complete within 2700 seconds.
  • Login to SSP-Installer with the username "root"

  • Describe the kubeadmcontrolplane:
sysadmin@sspir:~$ kubectl get kubeadmcontrolplane -A
NAMESPACE   NAME   CLUSTER   INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE   VERSION
ssp        ssp  ssp      true          true                   3          3       3         0             9d    v1.33.3

kubectl describe kubeadmcontrolplane <name_from_above_output> -n <namespace_from_above_output>
  • Observe the "Events" section at the end of the output.

  • Events section show the following warnings:
Warning  ControlPlaneUnhealthy  Waiting for control plane to pass preflight checks to continue reconciliation: 
[Machine <control-plane-node> reports EtcdMemberHealthy condition is unknown 
(Failed to connect to etcd: could not establish a connection to etcd members hosted on <other-control-plane-nodes>: 
failed to get etcd status: context deadline exceeded)]

Environment

Version: SSP-Installer 5.1

Cause

The upgrade process was waiting for the KubeadmControlPlane rollout to complete.

Control plane nodes were failing the etcd health preflight checks due to time synchronization issues among the nodes.

This indicates that the control plane machines were unable to connect to the etcd members because their clocks were out of sync. Kubernetes components such as etcd and kube-apiserver rely on consistent system time for secure TLS communication.

When there is a significant clock drift between nodes, TLS certificates may appear invalid or expired, resulting in failed connectivity and an incomplete control-plane rollout.

Resolution

Ensure that the system time is synchronized across all SSP nodes, including the installer, control-plane, and worker nodes.

Steps to Verify Time Synchronization

  1. Check time on the SSP Installer:

    date
    

     

  2. Identify all cluster nodes and their external IPs:

    k get nodes -o wide


    Note the values under the EXTERNAL-IP column.

  3. Log in to each node and check the current system time:

    ssh capv@<EXTERNAL-IP>
    date
    exit
    

    The time on all nodes and the installer must match (within a few seconds).

  4. If time is not in sync, follow the steps in the below KB article to synchronize it with the configured NTP source:

    SSP-I does not sync time from provided NTP source

Once the time synchronization is corrected:

  • Re-run the upgrade step.