Upgrade observed slowness in nodepool provisioning

Upgrade observed slowness in nodepool provisioning

search cancel

Upgrade observed slowness in nodepool provisioning

book

Article ID: 345737

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid VMware Telco Cloud Automation

Issue/Introduction

Sporadic slow machine creation using TKGm, with an increased number of sessions open and idle, especially exacerbated at scale.
This is a known issue in capv 1.05.

Environment

1.5.2, 1.6.0

Cause

Some of the vSphere VMs can end at the back of the processing queue because of some transient errors (e.g. too many client-side connections).

Resolution

Workaround:

To mitigate the issue ensure that the enable keep-alive session and sync-period to 5 Mins (Default is 10 Mins)

Switch to the management cluster context
kubectl config get-contexts
kubectl config use-context CONTEXT_NAME

Find and edit the capv controller manager deployment:
kubectl edit deployment –n capv-system capv-controller-manager

Add the following flags

spec:
containers:
- args:

….

- –-sync-period=5m0s
- --enable-keep-alive
Example:
Please edit the object below. Lines beginning with a
"#' will be ignored
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: vl
kind: Pod
metadata:
creationTimestamp:"2023-04-27T09:13:18Z"
generateName:capv-controller-manager-####-
labels:
cluster.x-k8s.io/provider: infrastructure-vsphere
control-plane: controller-manager
pod-template-hash: ####
name: capv-controller-manager-####-slngk
namespace: capv-system
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: capy-controller-manager-####
uid: ########-####-####-####-############
resourceVersion:"374526282"
uid:########-####-####-####-############
spec:
containers:
- args:
- --enable/leader-election
- --metrics-addr=0.0.0.0:8080
- --enable-keep-alive
- --sync-period=5m0s
- --logtostderr
- --V=4
env:
- name: HTTP PROXY
- name: HTTPS PROXY
- name: NO PROXY
image: projects.registry.vmware.com/tkg/cluster-api/cluster-api-vsphere-controller:v1.0.3_vmware.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3

Save and exit, which should cause the CAPV manager pod to be redeployed.

Additional Information

This change is NOT persistent and will be overridden during the next management cluster upgrade.
This should ensure a more frequent overall sync by CAPV.

Feedback

thumb_up Yes

thumb_down No