Tanzu Guest Cluster vmclass change failes

Products

VMware vSphere ESXi VMware vSphere Kubernetes Service

Issue/Introduction

This problem is confirmed a a bug on vCenter 8 U1x and is resolved in 8 U2 same behaviour is not observed anymore

some additional analisys completed to confirm the issue:

Verified with below commands:
Where Machine/cluster-update-test-01-f4w75-847jc is in pending state we could trace from cluster back to the KubeadmConfig

Where the KubeadmConfig -n vxvcftanzu01 cluster-update-test-01-node-pool-1-bootstrap-8gtgc-fg575 had a missing secret defined in it we have manually added (copy past the secret pointed from another KubeadmConfig

which trigger creation and adding the node to the cluster and system continued with next machine but got stuck on the same step
kubectl get KubeadmConfig -n vxvcftanzu01 cluster-update-test-01-node-pool-1-bootstrap-8gtgc-tzlhj -oyam provided below

kubectl get cluster -a
kubectl get cluster -A
kubectl describe cluster -n vxvcftanzu01 cluster-update-test-01
kubectl get kubeadmcontrolplane -A
kubectl get kubeadmcontrolplane -n vxvcftanzu01 cluster-update-test-01-f4w75
kubectl get kubeadmcontrolplane -n vxvcftanzu01 cluster-update-test-01-f4w75 -oyaml
kubectl get -n vxvcftanzu01 Machine/cluster-update-test-01-f4w75-847jc
kubectl get KubeadmConfig -A
kubectl get KubeadmConfig -n vxvcftanzu01 cluster-update-test-01-node-pool-1-bootstrap-8gtgc-fg575 -oyaml
kubectl get KubeadmConfig -A

kubectl get KubeadmConfig -n vxvcftanzu01 cluster-update-test-01-node-pool-1-bootstrap-vfnpt-22wl6 -oyaml
kubectl edit KubeadmConfig -n vxvcftanzu01 cluster-update-test-01-node-pool-1-bootstrap-8gtgc-fg575
kubectl get KubeadmConfig -n vxvcftanzu01 cluster-update-test-01-node-pool-1-bootstrap-8gtgc-fg575 -oyaml
kubectl get vspheremachines -A
kubectl get vspheremachines,machines -A
kubectl get KubeadmConfig -A
kubectl get KubeadmConfig,machines -A | grep cluster-update-test-01
kubectl get KubeadmConfig -A
kubectl get KubeadmConfig -n vxvcftanzu01 cluster-update-test-01-node-pool-1-bootstrap-8gtgc-tzlhj
kubectl get KubeadmConfig -n vxvcftanzu01 cluster-update-test-01-node-pool-1-bootstrap-8gtgc-tzlhj -oyam

kubectl get KubeadmConfig -n vxvcftanzu01 cluster-update-test-01-node-pool-1-bootstrap-8gtgc-tzlhj -oyaml
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfig
metadata:
  annotations:
    cluster.x-k8s.io/cloned-from-groupkind: KubeadmConfigTemplate.bootstrap.cluster.x-k8s.io
    cluster.x-k8s.io/cloned-from-name: cluster-update-test-01-node-pool-1-bootstrap-8gtgc
    run.tanzu.vmware.com/resolve-os-image: os-name=photon
  creationTimestamp: "2023-09-29T12:57:08Z"
  generation: 2
  labels:
    cluster.x-k8s.io/cluster-name: cluster-update-test-01
    cluster.x-k8s.io/deployment-name: cluster-update-test-01-node-pool-1-v8gnm
    cluster.x-k8s.io/set-name: cluster-update-test-01-node-pool-1-v8gnm-5d48844f9d
    machine-template-hash: "1804400958"
    topology.cluster.x-k8s.io/deployment-name: node-pool-1
    topology.cluster.x-k8s.io/owned: ""
  name: cluster-update-test-01-node-pool-1-bootstrap-8gtgc-tzlhj
  namespace: vxvcftanzu01
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Machine
    name: cluster-update-test-01-node-pool-1-v8gnm-5d48844f9d-qnvrk
    uid: d6200ed5-3547-48bd-86a1-3728159b3b4a
  resourceVersion: "324274822"
  uid: 3307a9d4-64a2-44bc-957f-770aec58dd62
spec:
  diskSetup: {}
  files:
  - content: |
      {{ ds.meta_data.hostname.split('.') | first }}
    owner: root:root
    path: /etc/hostname
    permissions: "0644"
  - content: |
      ::1 ipv6-localhost ipv6-loopback
      127.0.0.1 localhost {{ ds.meta_data.hostname.split('.') | first }}
    owner: root:root
    path: /etc/hosts
    permissions: "0644"
  - contentFrom:
      secret:
        key: <no value>
        name: <no value>
    owner: root:root
    path: /etc/ssl/certs/extensions-tls.crt
    permissions: "0644"
  format: cloud-config
  joinConfiguration:
    discovery:
      bootstrapToken:
        apiServerEndpoint: x.x.x.x:6443
        caCertHashes:
        - sha256:xxxxx
        token: vaq720.xige91v55pvrg8fx
    nodeRegistration:
      ignorePreflightErrors:
      - ImagePull
      kubeletExtraArgs:
        cloud-provider: external
        event-qps: "0"
        node-labels: run.tanzu.vmware.com/tkr=v1.24.9---vmware.1-tkg.4,run.tanzu.vmware.com/kubernetesDistributionVersion=v1.24.9---vmware.1-tkg.4,
        protect-kernel-defaults: "true"
        read-only-port: "0"
        register-with-taints: ""
        resolv-conf: /run/systemd/resolve/resolv.conf
        tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
  ntp:
    enabled: true
    servers:
    - <no value>
  postKubeadmCommands:
  - touch /root/kubeadm-complete
  - vmware-rpctool 'info-set guestinfo.kubeadm.phase complete'
  - vmware-rpctool 'info-set guestinfo.kubeadm.error ---'
  preKubeadmCommands:
  - set -xe
  - cloud-init single --name write-files --frequency always
  - cloud-init single --name users-groups --frequency always
  - vmware-rpctool 'info-set guestinfo.userdata ---'
  - hostname "{{ ds.meta_data.hostname.split('.') | first }}"
  - 'sed -i -e "s/^preserve_hostname: .*/preserve_hostname: true/" /etc/cloud/cloud.cfg'
  - echo -e 'kernel.panic_on_oops=1\nkernel.panic=10\nvm.overcommit_memory=1' >> /etc/sysctl.d/kubelet.conf
    && sysctl -p /etc/sysctl.d/kubelet.conf
  - uname -a | grep photon && /usr/bin/rehash_ca_certificates.sh
  - uname -a | grep ubuntu && cp /etc/ssl/certs/extensions-tls.crt /usr/local/share/ca-certificates/
  - uname -a | grep ubuntu && /usr/sbin/update-ca-certificates
  - systemctl set-property docker.service TasksMax=infinity
  - systemctl daemon-reload
  - systemctl enable containerd
  - systemctl is-enabled --quiet containerd.service && systemctl restart containerd.service
  - 'if systemctl is-enabled --quiet containerd.service ; then running=false; for
    _ in {1..15}; do crictl ps > /dev/null 2>&1 && running=true && break; sleep 1s;
    done; if [[ "${running}" != true ]]; then echo ''WARNING: containerd may not be
    running''; exit 1; fi; fi'
  - uname -a | grep photon && systemctl start docker.service
  - uname -a | grep ubuntu && systemctl enable kubelet
  - uname -a | grep ubuntu && systemctl start kubelet
  - if [ -f /root/kubeadm-complete ]; then echo "Kubeadm already completed - terminating
    early"; exit 0; fi
  useExperimentalRetryJoin: true
  verbosity: 2
status:
  conditions:
  - lastTransitionTime: "2023-09-29T12:57:09Z"
    message: 'failed to resolve file source: secret not found: vxvcftanzu01/<no value>:
      secrets "<no value>" not found'
    reason: DataSecretGenerationFailed
    severity: Warning
    status: "False"
    type: Ready
  - lastTransitionTime: "2023-09-29T12:57:09Z"
    status: "True"
    type: CertificatesAvailable
  - lastTransitionTime: "2023-09-29T12:57:09Z"
    message: 'failed to resolve file source: secret not found: vxvcftanzu01/<no value>:
      secrets "<no value>" not found'
    reason: DataSecretGenerationFailed
    severity: Warning
    status: "False"
    type: DataSecretAvailable

Symptoms:
When a new classy cluster is created following yaml example from here: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-with-tanzu-tkg/GUID-607BA980-E3E3-4167-ABC8-B9FCDCF44746.html
Where only the corresponding name namespace storage etc are updated to reflect the current deployment
After the creation of the cluster with kubectl apply -f ... we waited until the cluster was completely created.
Then we updated the existing cluster to upgrade the vmclass from guaranteed-small to guaranteed-medium.

The status of the created cluster object seems to show the machine rollover in the status conditons but nothing happens.
status:
conditions:
- lastTransitionTime: "2023-09-15T09:37:21Z"
message: Rolling 3 replicas with outdated spec (1 replicas up to date)
reason: RollingUpdateInProgress
severity: Warning
status: "False"
type: Ready
- lastTransitionTime: "2023-09-15T09:29:08Z"
status: "True"
type: ControlPlaneInitialized
- lastTransitionTime: "2023-09-15T09:37:21Z"
message: Rolling 3 replicas with outdated spec (1 replicas up to date)
reason: RollingUpdateInProgress
severity: Warning
status: "False"
type: ControlPlaneReady
- lastTransitionTime: "2023-09-15T09:27:22Z"
status: "True"
type: InfrastructureReady
- lastTransitionTime: "2023-09-15T09:27:18Z"
status: "True"
type: TopologyReconciled
- lastTransitionTime: "2023-09-15T09:27:05Z"
message: '[v1.24.11+vmware.1-fips.1-tkg.1]'
status: "True"
type: UpdatesAvailable
controlPlaneReady: true
failureDomains:
vmware-system-legacy:
controlPlane: true
infrastructureReady: true
observedGeneration: 5
phase: Provisioned
There is no new machine created if we check the vcenter inventory.
We checked events of the supervisor cluster and it seems the controller tries to do some preflight checks on machines.
17m Normal TopologyCreate machinehealthcheck/cluster-update-test-01-qhmwr Created "MachineHealthCheck/cluster-update-test-01-qhmwr"
17m Warning ReconcileError machinehealthcheck/cluster-update-test-01-qhmwr failed to create cluster accessor: error fetching REST client config for remote cluster "vxvcftanzu01/cluster-update-test-01": failed to retrieve kubeconfig secret for Cluster vxvcftanzu01/cluster-update-test-01: secrets "cluster-update-test-01-kubeconfig" not found
16m Warning ReconcileError machinehealthcheck/cluster-update-test-01-qhmwr failed to create cluster accessor: error creating dynamic rest mapper for remote cluster "vxvcftanzu01/cluster-update-test-01": Get "https://10.50.0.2:6443/api?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
15m Warning ControlPlaneUnhealthy kubeadmcontrolplane/cluster-update-test-01-qhmwr Waiting for control plane to pass preflight checks to continue reconciliation: [machine cluster-update-test-01-qhmwr-5p4j2 does not have APIServerPodHealthy condition, machine cluster-update-test-01-qhmwr-5p4j2 does not have ControllerManagerPodHealthy condition, machine cluster-update-test-01-qhmwr-5p4j2 does not have SchedulerPodHealthy condition, machine cluster-update-test-01-qhmwr-5p4j2 does not have EtcdPodHealthy condition, machine cluster-update-test-01-qhmwr-5p4j2 does not have EtcdMemberHealthy condition]
13m Warning ControlPlaneUnhealthy kubeadmcontrolplane/cluster-update-test-01-qhmwr Waiting for control plane to pass preflight checks to continue reconciliation: [machine cluster-update-test-01-qhmwr-qgjgj does not have APIServerPodHealthy condition, machine cluster-update-test-01-qhmwr-qgjgj does not have ControllerManagerPodHealthy condition, machine cluster-update-test-01-qhmwr-qgjgj does not have SchedulerPodHealthy condition, machine cluster-update-test-01-qhmwr-qgjgj does not have EtcdPodHealthy condition, machine cluster-update-test-01-qhmwr-qgjgj does not have EtcdMemberHealthy condition]
12m Warning ControlPlaneUnhealthy kubeadmcontrolplane/cluster-update-test-01-qhmwr Waiting for control plane to pass preflight checks to continue reconciliation: machine cluster-update-test-01-qhmwr-qgjgj reports ControllerManagerPodHealthy condition is false (Error, Pod kube-controller-manager-cluster-update-test-01-qhmwr-qgjgj is missing)
2m22s Warning ControlPlaneUnhealthy kubeadmcontrolplane/cluster-update-test-01-qhmwr Waiting for control plane to pass preflight checks to continue reconciliation: [machine cluster-update-test-01-qhmwr-5wv6h does not have APIServerPodHealthy condition, machine cluster-update-test-01-qhmwr-5wv6h does not have ControllerManagerPodHealthy condition, machine cluster-update-test-01-qhmwr-5wv6h does not have SchedulerPodHealthy condition, machine cluster-update-test-01-qhmwr-5wv6h does not have EtcdPodHealthy condition, machine cluster-update-test-01-qhmwr-5wv6h does not have EtcdMemberHealthy condition]

Steps to reproduce:
- create a complete new empty cluster with testcluster-small.yaml
- update cluster with testcluster-medium.yaml

Environment

VMware vSphere 7.0 with Tanzu

Cause

Some variable that are not initially defined when second apply operation is executed the variables are missing from the respective KubeadmConfig
This violates
Kuberneters #1 design rule is the declarative approach.
This is violated by this as it makes an infra as code approach completely impossible.

When an object is applied and re-apply it again doing changes, the api endpoint and the controller behind it has to care about changes and reconcile as long as kind: metadata.name: and namespace: stays the same.
Things the controller added to the original manifest to keep track of the state ( .status, various labels, and in this case things like the TKR_DATA block) should definitely not be needed to be returned into the original manifest - or at least, it should merge them on a re-apply.
If it needs some additional information to keep things running these need to be required at creation time.

Resolution

The recommendation is to avoid the method of applying the same yaml with changes while this bug is present.
And upgrade to vCenter U2
But if a customer is already in this state, they can attempt the following to restore the cluster:

- kubectl edit the cluster. Set spec.paused to true and add the annotation run.tanzu.vmware.com/pause: ""
- This will signal to the controller to repopulate the variables. Customer can confirm by checking the cluster resource.
- machine deployments will automatically rollout with properly set variables.
- some control plane machines may still be stuck in Pending with the old broken configuration. Deleting those machines will result in new machines being created with the new configuration.

Workaround:
If customer is already in this state, they can attempt the following to restore the cluster:

- kubectl edit the cluster. Set spec.paused to true and add the annotation run.tanzu.vmware.com/pause: ""
- This will signal to the controller to repopulate the variables. Customer can confirm by checking the cluster resource.
- machine deployments will automatically rollout with properly set variables.
- some control plane machines may still be stuck in Pending with the old broken configuration. Deleting those machines will result in new machines being created with the new configuration.

Additional Information

Impact/Risks:
Newly created machines will be in pending state and will never be created in vcenter leaving the cluster in unhealthy state