TCA 2.X - TKGm upgrade observed slowness in nodepool provisioning
search cancel

TCA 2.X - TKGm upgrade observed slowness in nodepool provisioning

book

Article ID: 345737

calendar_today

Updated On:

Products

VMware VMware Tanzu Kubernetes Grid VMware Telco Cloud Automation

Issue/Introduction

Symptoms:
Sporadic slow machine creation using TKGm, with an increased number of sessions open and idle, especially exacerbated at scale

Environment

Tanzu Kubernetes Grid 1.6.0
Tanzu Kubernetes Grid 1.5.2

Cause

Some of the VSphereVMs can end at the back of the processing queue because of some transient errors (e.g. too many client-side connections).

Resolution

VMware is aware of this issue in capv 1.05 and working to fix it in future release capv 1.5.

Workaround:

To mitigate the issue ensure that the enable keep-alive session and sync-period to 5 Mins (Default is 10 Mins)

Run the following command:

1. Switch to the management cluster context

kubectl config get-contexts
kubectl config use-context CONTEXT_NAME

2. Find and edit the capv controller manager deployment: 

kubectl edit deployment –n capv-system capv-controller-manager

and add the following flags 

spec: 
containers: 
- args: 

…. 

- –-sync-period=5m0s 
- --enable-keep-alive 
Example:

Please edit the object below. Lines beginning with a
"#' will be ignored
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: vl
kind: Pod
metadata:
  creationTimestamp:"2023-04-27T09:13:18Z"
  generateName:capv-controller-manager-9d8499798-
  labels:
    cluster.x-k8s.io/provider: infrastructure-vsphere
    control-plane: controller-manager
    pod-template-hash: 9d8499798
  name: capv-controller-manager-9d8499798-slngk
  namespace: capv-system
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: capy-controller-manager-9d8499798
    uid: 6a06b2b6-d168-434c-ae54-0696c727e430
  resourceVersion:"374526282"
  uid:6c777950-08d9-45d1-863f-9643af27a92b
spec:
  containers:
  - args:
    - --enable/leader-election
    - --metrics-addr=0.0.0.0:8080
    - --enable-keep-alive
    - --sync-period=5m0s 
    - --logtostderr
    - --V=4
  env:
  - name: HTTP PROXY
  - name: HTTPS PROXY
  - name: NO PROXY
  image: projects.registry.vmware.com/tkg/cluster-api/cluster-api-vsphere-controller:v1.0.3_vmware.1
  imagePullPolicy: IfNotPresent
  livenessProbe:
    failureThreshold: 3

4. Save and exit, which should cause the CAPV manager pod to be redeployed.
This change is not persistent and will be overridden during the next management cluster upgrade.

This should ensure a more frequent overall sync by CAPV