KubeadmControlPlanes
module and a MachineDeployment
module. Each cluster module corresponds to a vAPI response size of 36 bytes on vCenter. In some cases, additional cluster modules may be created on vSphere, even if they are not actively utilised, leading to significantly larger vAPI response sizes. If the response size exceeds 7MB, CAPV will no longer be able to access cluster modules from vCenter and the pod may crash. You will see the below error in CAPV logs when this scenario is encountered:E0719 08:57:47.271292 1 clustermodule_reconciler.go:93] capv-controller-manager/vspherecluster-controller/rms/test01-tkg "msg"="failed to verify cluster module for object" "error"="GET https://endpoint.test/rest/vcenter/cluster/modules: 500 Internal Server Error" "moduleUUID"="6114a41f-3451-79f2-77b8-24f7031676fl" "name"="test01-tkg-control-plane"
2023-07-19T08:57:47.266Z | ERROR | vAPI-I/O dispatcher-1 | SessionFacade | Unexpected error occurred while executing the call with session [email protected] (internal id 82w43c20-7351-44f1-9974-3740ff89v283, token 9a85t...) for method com.vmware.vcenter.cluster.modules.list with uuid 50e1s528-1019-49gf-8llb-fe4k93bdk935. com.vmware.vapi.endpoint.common.UnacceptableResponseException: Response size 106942232b is greater than allowed 7000000b
To get the number of expected modules, we can run:
kubectl get vspherecluster -A -o json | jq '.items[].spec.clusterModules[].moduleUUID | count' -r | wc -l
And compare that number against the number of modules existing on vCenter:
govc cluster.module.ls | wc -l
Note: Please check that there are no other systems creating/using cluster modules before doing this. Otherwise modules created by somebody else get deleted.
Note: If we delete a module which was used by CAPV, a new reconciliation will recreate it.
Note: If there are multiple management clusters within the same vCenter, you will need to follow the steps below for each additional management cluster. However, for subsequent clusters, replace "govc-modules.txt" with "filtered.txt" Continue this process until you have removed all cluster modules generated by all management clusters from "filtered.txt".
Procedure to remove excess cluster modules
First generate a complete list of cluster modules using the below command:
govc cluster.module.ls > govc-modules.txt
We can get the list of clustermodules from the management cluster by using the following command (note: this uses the commands `kubectl` and `jq`):
echo $(kubectl get vspherecluster -A -o json | jq '.items[].spec.clusterModules[].moduleUUID' -r; head )
Now to filter out overlapping modules, we can run the following (if we are removing modules from a second management cluster, use "filtered.txt" in place of "govc-modules.txt". It is important to rename the input "filtered.txt" file to something different to avoid overwriting the file. eg: filtered-1.txt):
(kubectl get vspherecluster -A -o json | jq '.items[].spec.clusterModules[].moduleUUID' -r; cat govc-modules.txt | awk '{print $NF}') | sort | uniq -c | grep -E '^ +1 ' | awk '{print $NF}' > filtered.txt
And lastly we should execute `govc cluster.module.rm` for every entry in that list. Depending on the version of the govc CLI that you have, please use the appropriate method:
For govc version 0.31.0 or higher, please use the below command:
govc cluster.module.rm - < filtered.txt
For govc version lower than 0.31.0, please use the below command (this could take a long time depending on the amount of modules that excess exist on vCenter):
while read -r ID; do echo "Deleting $ID"; govc cluster.module.rm $ID; done < filtered.txt
Some time after cleanup, we should check if we still get a increasing amount of cluster modules. And compare the number against the expected one.
To get the number of expected modules, we could run:
kubectl get vspherecluster -A -o json | jq '.items[].spec.clusterModules[].moduleUUID | count' -r | wc -l
And compare that number against
govc cluster.module.ls | wc -l
E0719 09:02:38.748461 1 controller.go:317] controller/vspherecluster "msg"="Reconciler error" "error"="unexpected error while probing vcenter for infrastructure.cluster.x-k8s.io/v1beta1, Kind=VSphereCluster test/test01-tkg: POST \"/sdk\": 503 Service Unavailable" "name"="test01-tkg" "namespace"="test" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="VSphereCluster"