Upgrade observed slowness in nodepool provisioning
search cancel

Upgrade observed slowness in nodepool provisioning

book

Article ID: 345737

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid VMware Telco Cloud Automation

Issue/Introduction

  • Sporadic slow machine creation using TKGm, with an increased number of sessions open and idle, especially exacerbated at scale.
  • This is a known issue in capv 1.05. 

Environment

1.5.2, 1.6.0

Cause

Some of the vSphere VMs can end at the back of the processing queue because of some transient errors (e.g. too many client-side connections).

Resolution

Workaround:

To mitigate the issue ensure that the enable keep-alive session and sync-period to 5 Mins (Default is 10 Mins)

  • Switch to the management cluster context
    kubectl config get-contexts
    kubectl config use-context CONTEXT_NAME
  • Find and edit the capv controller manager deployment: 
    kubectl edit deployment –n capv-system capv-controller-manager
  • Add the following flags 

    spec: 
    containers: 
    - args: 

    …. 

    - –-sync-period=5m0s 
    - --enable-keep-alive

    Example:
    Please edit the object below. Lines beginning with a
    "#' will be ignored
    # and an empty file will abort the edit. If an error occurs while saving this file will be
    # reopened with the relevant failures.
    #
    apiVersion: vl
    kind: Pod
    metadata:
      creationTimestamp:"2023-04-27T09:13:18Z"
      generateName:capv-controller-manager-####-
      labels:
        cluster.x-k8s.io/provider: infrastructure-vsphere
        control-plane: controller-manager
        pod-template-hash: ####
      name: capv-controller-manager-####-slngk
      namespace: capv-system
      ownerReferences:
      - apiVersion: apps/v1
        blockOwnerDeletion: true
        controller: true
        kind: ReplicaSet
        name: capy-controller-manager-####
        uid: ########-####-####-####-############
      resourceVersion:"374526282"
      uid:########-####-####-####-############
    spec:
      containers:
      - args:
        - --enable/leader-election
        - --metrics-addr=0.0.0.0:8080
        - --enable-keep-alive
        - --sync-period=5m0s 
        - --logtostderr
        - --V=4
      env:
      - name: HTTP PROXY
      - name: HTTPS PROXY
      - name: NO PROXY
      image: projects.registry.vmware.com/tkg/cluster-api/cluster-api-vsphere-controller:v1.0.3_vmware.1
      imagePullPolicy: IfNotPresent
      livenessProbe:
        failureThreshold: 3
  • Save and exit, which should cause the CAPV manager pod to be redeployed.

Additional Information

  • This change is NOT persistent and will be overridden during the next management cluster upgrade.
  • This should ensure a more frequent overall sync by CAPV.