Deleted k8s node is not recreated by Machine Health Check (MHC) in TKGm
search cancel

Deleted k8s node is not recreated by Machine Health Check (MHC) in TKGm

book

Article ID: 421217

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime

Issue/Introduction

Machine Health Check (MHC) is not recreating the Node, despite the Node being removed from the "kubectl get nodes" output.

# kubectl get nodes
# - Only 2 worker nodes exist
NAME                            STATUS   ROLES           AGE   VERSION
test-controlplane-f5zbl-j44jj   Ready    control-plane   45d   v1.33.1+vmware.1
test-md-0-h2d4r-4qhs2-rsf2w     Ready    <none>          45d   v1.33.1+vmware.1
test-md-0-h2d4r-4qhs2-xwwhn     Ready    <none>          45d   v1.33.1+vmware.1

However, some orphaned objects (ma/vspheremachine/vspherevms) remain unexpectedly.

# kubectl get ma -A
NAME                           CLUSTER   NODENAME
test-controlplane-f5zbl-j44jj  test      test-controlplane-f5zbl-j44jj
test-md-0-h2d4r-4qhs2-rsf2w    test      test-md-0-h2d4r-4qhs2-rsf2w
test-md-0-h2d4r-4qhs2-xwwhn    test      test-md-0-h2d4r-4qhs2-xwwhn
test-md-0-h2d4r-4qhs2-z754m    test      test-md-0-h2d4r-4qhs2-z754m # <------- still remained
# kubectl get vspheremachine -A
NAME                           CLUSTER  READY
test-controlplane-f5zbl-j44jj  test     true
test-md-0-h2d4r-4qhs2-rsf2w    test     true
test-md-0-h2d4r-4qhs2-xwwhn    test     true
test-md-0-h2d4r-4qhs2-z754m    test     true # <---------- still remained
# kubectl get vspherevms -A
NAME                           AGE
test-controlplane-f5zbl-j44jj  45d
test-md-0-h2d4r-4qhs2-rsf2w    45d
test-md-0-h2d4r-4qhs2-xwwhn    45d
test-md-0-h2d4r-4qhs2-z754m    45d # <---------- still remained

Environment

TKGm v2.5.0

Cause

Known Issue - TKGm v2.5.0 - Orphan vSphereMachine objects after cluster upgrade or scale

Resolution

Delete the unexpected orphaned node object (ma/vspheremachine/vspherevms) manually.

# 1. Switch the context to the "Management Cluster"
kubectl config use-context <MANAGEMENT_CLUSTER>

# 2. Check
kubectl -n <NAMESPACE> get ma,vspheremachine,vspherevms

# 3. Delete the objects manually
kubectl -n <NAMESPACE> delete ma <TARGET_NODE_OBJECT>
kubectl -n <NAMESPACE> delete vspheremachine <TARGET_NODE_OBJECT>
kubectl -n <NAMESPACE> delete vspherevms <TARGET_NODE_OBJECT>

# 4. After 5 minutes, verify that the objects have been deleted successfully
kubectl -n <NAMESPACE> get ma,vspheremachine,vspherevms

# 5. In case of deletion failure, delete the value of the finalizers
kubectl -n <NAMESPACE> edit ma <TARGET_NODE_OBJECT>
kubectl -n <NAMESPACE> edit vspheremachine <TARGET_NODE_OBJECT>
kubectl -n <NAMESPACE> edit vspherevms <TARGET_NODE_OBJECT>

# 6. New node will be recreated by MHC

 

If MHC is not triggered, restart the Cluster API pods.

kubectl -n capi-system rollout restart deployment/capi-controller-manager
kubectl -n capv-system rollout restart deployment/capv-controller-manager

If the cluster is in a "paused" state, revert it to the "unpaused" state.

kubectl -n <NAMESPACE> patch cluster <CLUSTER> --type merge -p '{"spec":{"paused": false}}'

Additional Information