Tanzu Mission Control Cluster Inspection failed to run when the MaxPod capacity is reached on cluster nodes

Products

VMware Tanzu Mission Control - SM VMware Tanzu Mission Control VMware Tanzu Kubernetes Grid Plus VMware vSphere Kubernetes Service

Issue/Introduction

The cluster Inspection page will show that the CIS benchmark "Inspection failed to run" error.
The inspection-extension pod under the vmware-system-tmc namespace will show error similer to the following.

# kubectl logs -n vmware-system-tmc inspection-extension-#########-#####

8l4.183d935255bb2b75","namespace":"vmware-system-tmc","time":"2025-05-08T14:53:05Z"}
{"func":"ReconcileInspect.Reconcile","level":"info","msg":"Reconciling for request: vmware-system-tmc/inspection-d0############c0","time":"2025-05-08T14:53:05Z"}
{"error":"Inspect.intents.tmc.cloud.vmware.com \"inspection-d0############c0\" not found","func":"ReconcileInspect.Reconcile","level":"error","msg":"r.Get: object not found","time":"2025-05-08T14:53:05Z"}
When checking the pods running on the vmware-system-tmc namespace one will see several sonobuoy-kube-bench-daemon-set are in Pending state.
When describe the sonobuoy-kube-bench-daemon-set pod

# kubectl describe pods -n vmware-system-tmc sonobuoy-kube-bench-daemon-set-#######-####

status": {
"phase": "Pending",
"conditions": [
{
"type": "PodScheduled",
"status": "False",
"lastProbeTime": null,
"lastTransitionTime": "2025-06-19T19:26:02Z",
"reason": "Unschedulable",
"message": "0/20 nodes are available: 1 Too many pods. preemption: 0/20 nodes are available: 20 No preemption victims found for incoming pod.."

When describing the sonobuoy-kube-bench DaemonSet one can see that there are sonobuoy-kube-bench-daemon-set pods still not available.

status": {
"currentNumberScheduled": 20,
"numberMisscheduled": 0,
"desiredNumberScheduled": 20,
"numberReady": 16,
"observedGeneration": 1,
"updatedNumberScheduled": 20,
"numberAvailable": 16,
"numberUnavailable": 4
All cluster node are in Ready state.

Environment

VMware vSphere Kubernetes Service (VKS)

Cause

The CIS benchmark TMC inspection is faling since several sonobuoy-kube-bench-daemon-set pods are not getting into Running state.
The reason several sonobuoy-kube-bench-daemon-set pods are in pending state because there are trying to schedule on nodes that have reached their maxPods per node capacity (which is 110 pod per node) .

Note: the sonobuoy-kube-bench-daemon-set pods are part of DeamonSet then a copy of the pod will need to run on each cluster node.

The Following are the steps to validate if some of the cluster nodes reached the 110 default maxPods per node capacity:

Note: the following command need to get run from the guest cluster context.

Run the following command to check the pods capacity per node.

# kubectl get nodes -o custom-columns=NAME:.metadata.name,Capacity:.status.capacity.pods,Allocatable:.status.allocatable.pods

Ex:

kubectl get nodes -o custom-columns=NAME:.metadata.name,Capacity:.status.capacity.pods,Allocatable:.status.allocatable.pods

NAME Capacity Allocatable
GuestCluster--worker-####-###jr-pbr2b 110 110
GuestCluster--worker-####-###r-wnv8g 110 110
GuestCluster--worker-####-###jr-z5ndw 110 110
GuestCluster-cp-###h6 110 110
GuestCluster-cp-###q6 110 110
GuestCluster-cp-###r4 110 110
Run the following command to get the total number of pod running on every node in the cluster.

# kubectl describe nodes | grep -E 'HolderIdentity|Non-terminated Pods'

EX:
kubectl describe nodes | grep -E 'HolderIdentity|Non-terminated Pods'

HolderIdentity: GuestCluster--worker-####-###j-pbr2b
Non-terminated Pods: (110 in total)

HolderIdentity: GuestCluster--worker-####-###j-wnv8g
Non-terminated Pods: (110in total)

HolderIdentity: GuestCluster--worker-####-###j-z5ndw
Non-terminated Pods: (80in total)

HolderIdentity: GuestCluster-cp-###h6
Non-terminated Pods: (40 in total)

HolderIdentity: GuestCluster-cp-###q6
Non-terminated Pods: (50 in total)

HolderIdentity: GuestCluster-cp-###r4
Non-terminated Pods: (30 in total)

Note: "Non-terminated Pods" =Total number of pods scheduled on the node .
From the pervious output there are 2 worker nodes that have reached the MaxPod capacity "110" and they will the nodes that don't have the sonobuoy-kube-bench-daemon-set pods running on them.

This can be confirmed by the following command

# kubectl get pods -n vmware-system-tmc -o wide | grep sonobuoy-kube-bench-daemon-set | grep -v Running

Note: The sonobuoy-kube-bench-daemon-set pods will only get created when there is a TMC inspection running and will get deleted once the TMC inspection task is done.

Resolution

Currently their is no supported way to modify the MaxPod count in VMware vSphere Kubernetes Service (VKS).

Workaround"

Option 1: Manually balance the number of pods on the node:

Run the following command to get the nodes current Pods counts
# kubectl describe nodes | grep -E 'HolderIdentity|Non-terminated Pods'
Run the following command to see the node with the highes CPU or Memory. usages
# kubectl top node
Base on the output from both pervious commands you can Cordon the node/s that reached the max number of pods and the ones that with high memory/CPU usages.
#kubectl cordon <node-name>
Then restart the pods from the nodes that reached the MaxPod capacity (since we cordon the above nodes the pods will scheduled on nodes with low pods counts and CPU/memory usage ).
Uncordon the pervious cordoned nodes .
#kubectl uncordon <node-name>
Then run the TMC inspection .

Option 2: Use Descheduler for Kubernetes

The Descheduler for Kubernetes is not Tested/Installed/Configured nor Supported by VMware vSphere Kubernetes Service.
The Descheduler for Kubernetes is used to rebalance clusters by evicting pods that can potentially be scheduled on better nodes.
Descheduler does not schedule replacement of evicted pods but relies on the default scheduler for that.
Please read more about Descheduler for Kubernetes before implementing it in your environment.
See node Capacity and Allocatable