TKG Autoscaler does not act scale out when the number of the worker nodes is under AUTOSCALER_MIN

Products

VMware Tanzu Kubernetes Grid Management

Issue/Introduction

Symptoms:

Autoscaler does not attempt to scale in/out the number of worker nodes
- Example: "AUTOSCALER_MIN_SIZE_0" is defined "3" but md-0 WORKERS is still only "1"
Autoscaler does not work after performing "tanzu cluster scale"
Cluster status is "updating" every 10 minutes because a new worker node status is looping between "Deleting" and "Provisioning"
- Monitor "kubectl get machine -A -w"

Environment

This KB is valid for

Tanzu Kubernetes Grid 2.3.x
Tanzu Kubernetes Grid 2.4.x
Tanzu Kubernetes Grid 2.5.1

Cause

Autoscaler does not function when the number of worker nodes (replicas: X) falls outside the range defined by AUTOSCALER_MIN_SIZE and AUTOSCALER_MAX_SIZE
Autoscaler doesn't not function when AUTOSCALER_MIN_SIZE == AUTOSCALER_MAX_SIZE
(ClusterClass only) Autoscaler does not function when machineDeployment has 'replicas: X' in the kind:Cluster manifest
- When executing tanzu cluster scale, "replicas: X" field is added to the MachineDeployment automatically
- In Legacy Clusters, "replicas: X" doesn't block the autoscaler behavior

Resolution

Operation Overview

Using --enforce-node-group-min-size, which is supported by Autoscaler version 1.26 (TKG 2.3.x later)
(ClusterClass only) Delete "replicas: X" from the machineDeployment

1. Preparation

# Switch to the Management Cluster context
kubectl config use-context ${mc_context_name}

# Set the target cluster name
kubectl get cluster -A
cluster=w2-cc
namespace=default

# Check the cluster type
kubectl -n ${namespace} get cluster ${cluster} -ojson | jq .spec.topology.class
#> "tkg-vsphere-default-v1.x.x" # --> ClusterClass, use Step 3-A later
#> null                         # --> Legacy Cluster, use Step 3-B later

2. Update Deployment - AutoScaler pod

# Check - Autoscaler version is v1.26 or higher
kubectl -n ${namespace} get deployment ${cluster}-cluster-autoscaler -oyaml | grep image:
#>  image: projects.registry.vmware.com/tkg/cluster-autoscaler:v1.28.0_vmware.1

# Check the current parameter
kubectl -n ${namespace} get deployment ${cluster}-cluster-autoscaler -oyaml | grep -A15 args:
#>      - args:
#>        ....
#>        - --scale-down-delay-after-add=10m
#>        - --scale-down-delay-after-delete=10s
#>        - --scale-down-delay-after-failure=3m
#>        - --scale-down-unneeded-time=10m
#>        - --max-node-provision-time=15m
#>        - --max-nodes-total=0

# Backup
kubectl -n ${namespace} get deployment ${cluster}-cluster-autoscaler -oyaml > ${cluster}-cluster-autoscaler-$(date +%Y-%m%d-%H%M).yaml

# Add "--enforce-node-group-min-size" flag
kubectl -n ${namespace} edit deployment ${cluster}-cluster-autoscaler
#>      - args:
#>      ....
#>        - --max-nodes-total=0
#>        - --enforce-node-group-min-size=true # <----------- NEW

# Autoscaler pod will be recreated automatically
kubectl -n ${namespace} get pods | grep -E 'autoscaler|NAME'
#> NAME                                            READY   STATUS    RESTARTS   AGE
#> w2-cc-cluster-autoscaler-f4fb9fc96-tn8wc        1/1     Running   0          12s

# Check the new parameter
kubectl -n ${namespace} get deployment ${cluster}-cluster-autoscaler -oyaml | grep -A15 args:

3-A. (ClusterClass) - Review cluster-api-autoscaler-node-group-min-size

This step is for ClusterClass. Please don't apply against Legacy Cluster
Review cluster-api-autoscaler-node-group-max-size > cluster-api-autoscaler-node-group-min-size
- Don't set same value like cluster-api-autoscaler-node-group-max-size == cluster-api-autoscaler-node-group-min-size
Delete "replicas: X" field from the target machineDeployment

# Check target node-pool
tanzu cluster node-pool list $cluster
#>  NAME  NAMESPACE  PHASE    REPLICAS  READY  UPDATED  UNAVAILABLE
#>  md-0  default    Running  1         1      1        0

# Check current autoscaler configuration
kubectl -n ${namespace} get cluster ${cluster} -ojsonpath="{.spec.topology.workers.machineDeployments}" | jq .[]
#> {
#>   "class": "tkg-worker",
#>   "metadata": {
#>     "annotations": {
#>       "cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size": "4",
#>       "cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size": "3",
#>       "run.tanzu.vmware.com/resolve-os-image": "image-type=ova,os-name=photon"
#>     }
#>   },
#>   "name": "md-0",
#>   "replicas": 1
#> }

# Backup
kubectl -n ${namespace} get cluster ${cluster} -oyaml > ${cluster}-$(date +%Y-%m%d-%H%M).yaml

# Edit ClusterClass
# - Review "cluster-api-autoscaler-node-group-max-size" > "cluster-api-autoscaler-node-group-min-size"
# - Delete "replicas: X"
kubectl -n ${namespace} edit cluster ${cluster}
#>      machineDeployments:
#>      - class: tkg-worker
#>        metadata:
#>          annotations:
#>            cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "4"
#>            cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "3" # CHECK
#>            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=photon
#>        name: md-0
#>        replicas: 1  # <--------------- DELETE

# Check the current autoscaler configuration
kubectl -n ${namespace} get cluster ${cluster} -ojsonpath="{.spec.topology.workers.machineDeployments}" | jq .[]
#> {
#>   "class": "tkg-worker",
#>   "metadata": {
#>     "annotations": {
#>       "cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size": "4",
#>       "cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size": "3",
#>       "run.tanzu.vmware.com/resolve-os-image": "image-type=ova,os-name=photon"
#>     }
#>   },
#>   "name": "md-0",
#> }

3-B. (Legacy Cluster) - Review cluster-api-autoscaler-node-group-min-size

This step is for Legacy Cluster. Please don't apply against ClusterClass
Review cluster-api-autoscaler-node-group-max-size > cluster-api-autoscaler-node-group-min-size
- Don't set the same value like cluster-api-autoscaler-node-group-max-size == cluster-api-autoscaler-node-group-min-size

# Set machineDeployment name
tanzu cluster node-pool list $cluster
nodepool=md-0
md=${cluster}-${nodepool}

# Check the current autoscaler configuration
kubectl -n ${namespace} get md ${md} -oyaml | grep "cluster-api-autoscaler" | grep size:
#>    cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "4"
#>    cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "3"

# Option: Update "cluster-api-autoscaler-node-group-min-size" for your requirements

# Backup
kubectl -n ${namespace} get md ${md} -oyaml > ${md}-$(date +%Y-%m%d-%H%M).yaml

# Review "cluster-api-autoscaler-node-group-min-size"
kubectl -n ${namespace} edit md ${md}

# Check the new autoscaler configuration
kubectl -n ${namespace} get md ${md} -oyaml | grep "cluster-api-autoscaler" | grep size:

4. Check

Autoscaler starts to increase the number of worker nodes (1 --> 3)

Scale-out will be kicked within 10 seconds (default)
Scale-in will be kicked after 10 minutes (default)

# Check the target cluster STATUS is "updating" --> "running"
tanzu cluster list

# Check - All nodes PAHSE should be "Running"
kubectl get ma -A

Additional Information

Document Links

Troubleshooting

Check Autoscaler pod log.

kubectl -n ${namespace} logs ${cluster}-cluster-autoscaler-xxxxx

Monitor the machine's status to know the current behavior.

kubectl get ma -A -w

# If you found the "Deleting" stuck node, delete it forcibly
namespace=default
machine=w2-cc-md-0-npvz4-7n4l8-lmhms

# Action - Delete the stuck node forcibly
kubectl -n ${namespace} patch ma ${ma} -p '{"metadata": {"finalizers": null}}' --type=merge

If scale-out or scale-in event for the worker nodes is not triggered, check whether the cluster is paused.

# Check (no output --> pause: false)
kubectl -n ${namespace} get cluster ${cluster} -ojsonpath='{.spec.paused}' | jq .

# Action: Unpause is must for node scaling
kubectl -n ${namespace} patch cluster ${cluster} --type merge -p '{"spec":{"paused": false}}'

# Action: Pause - If the cluster is updated frequently, consider to pause at once
kubectl -n ${namespace} patch cluster ${cluster} --type merge -p '{"spec":{"paused": true}}'