During registration of TKGm management cluster to TMC the workload clusters are not visible as available clusters
search cancel

During registration of TKGm management cluster to TMC the workload clusters are not visible as available clusters

book

Article ID: 379694

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime Tanzu Mission Control VMware Tanzu Mission Control Tanzu Mission Control Prepaid Commitment Plan per Core

Issue/Introduction

During troubleshooting of existing MGMT cluster and workload cluster connected to the TMC deregister operation have been initiated

After the cluster was registered again the workload cluster window is empty 

Environment

TKGm 2.x

K8s 1.x

TMC 1.3 

 

Cause

To confirm the issue a cycle of deregister - register operation might have to be implemented

Logs can be collected for TMC namespace form all pods:

kubectl get pods -n vmware-system-tmc -o jsonpath='{range .items[*]}{.metadata.name}{" "}{.spec.containers[].name}{"\n"}{end}' | while read pod containers; do
  for container in $containers; do
    echo "Logs for pod: $pod, container: $container"
    kubectl logs $pod -c $container -n vmware-system-tmc
  done
done > TMC-pods.log

There are two pods that are of interest resource-retriever and sync-agent: 

ag -f "Logs for pod"

1:Logs for pod: agent-updater-xxxxxx, container: agent-updater
14:Logs for pod: agentupdater-workload-xxxxxx, container: agentupdater-workload
19:Logs for pod: agentupdater-workload-xxxxxx, container: agentupdater-workload
21:Logs for pod: cluster-health-extension-xxxxxx, container: cluster-health-extension
81:Logs for pod: extension-manager-xxxxxx, container: extension-manager
325:Logs for pod: extension-updater-xxxxxx, container: extension-updater
462:Logs for pod: intent-agent-xxxxxx, container: intent-agent
501:Logs for pod: lcm-tkg-extension-xxxxxx, container: manager
513:Logs for pod: lcm-tkg-operator-xxxxxxx, container: manager
530:Logs for pod: resource-retriever-xxxxxx, container: manager
616:Logs for pod: sync-agent-xxxxxx, container: sync-agent
814:Logs for pod: tmc-auto-attach-xxxxxx, container: tmc-auto-attach

Check for any errors in these pods:

Under resources-retreiver the respective log indicate problem to access the vcenter from the Pod:

{"component":"renderer-controller","error":"unable to render Option vmware-system-tmc/options: unable to synchronize child resources in datacenter \"<DATACENTER>\": failed to list template vms: Post \"https://<VCENTERIP>/sdk\": context deadline exceeded","file":"-20220523233716-da367aa59859/pkg/logr/logrus/logrus-logr.go:55","func":"logrus.logrusWrapper.Error","level":"error","msg":"Reconciler error","reconcileID":"7fa70513-7322-4f3d-bf1d-a761087b98cd","time":"2024-10-07T16:33:20Z"}

This error indicates a failed check and as result the workload clusters cannot be added successfully to the TMC portal 

There are two possible causes 

  1. Connectivity issue from the pod accessing the vCenter server - possible reason a networkpolicy or firewall
  2. API call to vcenter takes too much time to complete and the operation is timing out

Resolution

For option 1:

Verify by connecting to the resource-retreiver pod using debug to validate if you can connect to the vcenter:

 

kubectl debug -it resource-retreiver-XXX --image=IMAGEREGISTRY/netshoot --copy-to mypod-debugger --share-processes
 
verify if connectivity from the pod to the vcenter is successful 

Try and fix the connectivity problem and retry the registration.

For Option 2:

From vcenter try and run the API call get all vms 

Depends on the time for the report to be complete consider if too long this could be the possible cause for the issue 

Contact Tanzu support for assistance