VKS cluster node deployment using clusterclass api v1beta1 with additional trusted CAs fails during cloud-init phase
search cancel

VKS cluster node deployment using clusterclass api v1beta1 with additional trusted CAs fails during cloud-init phase

book

Article ID: 402780

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • While deploying a cluster with additional trusted CA certificates, the node deployment fails at the first node and no additional nodes are deployed .

  • On the Supervisor, the cluster object will mention failure with connectivity :
        message: 'unable to retrieve kube-proxy daemonset from the guest cluster: failed
          to get API group resources: unable to retrieve the complete list of server APIs:
          apps/v1: Get "https://##.###.###.#:6443/apis/apps/v1?timeout=10s": dial tcp
          ##.###.###.#:6443: connect: connection refused'
    
  • On the capi controller manager logs, you will see messages as below :

    YYYY-MM-DDT18:46:58.222879194Z stderr F E0604 18:46:58.220446 1 controller.go:324] "Reconciler error" err="failed to get client: failed to create cluster accessor: error creating http client and mapper for remote cluster \"namespace/guest-cluster-name\": error creating client for remote cluster \"namespace/guest-cluster-name\": cluster is not reachable: Get \"https://##.###.###.#:6443/?timeout=5s\": dial tcp ##.###.###.#:6443: connect: connection refused" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="namespace/guest-cluster-name-659n9-gt28k" namespace="namespace" name="guest-cluster-name-659n9-gt28k" reconcileID="22d122f4-####-####-####-b8205723f296"
  • Running crictl ps within the node does not reveal any running containers

  • Within the node var/log/cloud-init-output.log has the below entries :

    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] Cloud-init v. 24.4 running 'modules:final' at Mon, 09 Jun 2025 05:40:48 +0000. Up 29.05 seconds.
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] + umount /var/lib/etcd
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] ++ ls -A /var/lib/etcd
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] + '[' '' ']'
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] + mount -t ext4 /dev/sdb1 /var/lib/etcd
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] + rm -rf /var/lib/etcd/lost+found
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] ++ ls -A /var/tmp/_var_lib_etcd
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] ls: cannot access '/var/tmp/_var_lib_etcd': No such file or directory
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] + '[' '' ']'
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] + set -xe
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] + cloud-init single --name write-files --frequency always
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] [YYYY-MM-DD 05:40:48] Cloud-init v. 24.4 running 'single' at Mon, 09 Jun 2025 05:40:48 +0000. Up 29.40 seconds.
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] [YYYY-MM-DD 05:40:48] YYYY-MM-DD 05:40:48,734 - log_util.py[WARNING]: Running module write-files (<module 'cloudinit.config.cc_write_files' from '/usr/lib/python3.11/site-packages/cloudinit/config/cc_write_files.py'>) failed
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] [YYYY-MM-DD 05:40:48] YYYY-MM-DD 05:40:48,736 - main.py[WARNING]: Ran write-files but it failed!
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] YYYY-MM-DD 05:40:48,780 - cc_scripts_user.py[WARNING]: Failed to run module scripts_user (scripts in /var/lib/cloud/instance/scripts)
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] YYYY-MM-DD 05:40:48,780 - log_util.py[WARNING]: Running module scripts_user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.11/site-packages/cloudinit/config/cc_scripts_user.py'>) failed
    YYYY-MM-DDT05:40:48.861557+00:00 localhost cloud-init[922]: [YYYY-MM-DD 05:40:48] Cloud-init v. 24.4 finished at Mon, 09 Jun 2025 05:40:48 +0000. Datasource DataSourceVMware [seed=guestinfo].  Up 29.58 seconds
  • Subsequent cluster deployments in the same namespace with no additional CAs defined may also fail.

  • When deploying new clusters with single encoded expired CA certificates, the cluster deployment fails without any obvious mentions of certificate errors.

 

Environment

VKS 3.3.x

Cause

This happens due to expired single encoded additional CA certificates on the secret used for the cluster. 

Resolution