HealthWatch v2.1.5 Scrape Config files failed to be generated for TKGi Clusters
book
Article ID: 313105
calendar_today
Updated On:
Products
VMware Tanzu Kubernetes Grid
Issue/Introduction
Symptoms: pks-cluster-discovery for clusters is not working and scrape config file is not generated as per error below from pks-cluster-discovery stdout log file
Error Example from pks-cluster-discovery.stdout.log
[1m2021-10-25 05:54:15ESC [__meta_kubernetes_pod_name], action=replace, regex=, targetLabel=instance, replacement=)], dnsSdConfig=[], staticConfigs=[])]ESC[1m2021-10-25 05:54:15ESC[m ESC[32mINFOESC[m ESC[2mpks.ScrapeConfigGeneratorESC[m [discover-clusters] Could not get scrape config for cluster xxxxxxxxxxxxxxxxx
Cause
Scrape Config files are collected when TKGI clusters connect to the Kubernetes API through the TKGI API using a UAA client. The UAA Client for HealthWatch is generated when UAA is enabled as the OIDC provider for TKGI
Failing on thepks get-credentials <clustername> command, and pks login with the same user TKGI Cluster Discovery for HealthWatch failed to login to the clusters due to RBAC missing permissions.
Resolution
Scrape Config files are collected when TKGI clusters connect to the Kubernetes API through the TKGI API using a UAA client. The UAA Client for HealthWatch is generated when UAA enabled as the OIDC provider for TKGI
UAA dedicated client called 'healthwatch-tkgi-admin-read' created in UAA for use in Healthwatch tile
TKGI has some RBAC rules in it where only an admin can access all clusters. Otherwise you can only access the clusters you created. The required permission is pks.clusters.admin. Cluster Discovery requires the ability to login to each cluster so that it can create a read-only user on each cluster with access to the metrics endpoints.
In order to verify if it's RBAC issue, follow the steps below
Check the status of the service [monit status pks-cluster-discovery]
Ensure Service is running or try restarting the service if it's not running
Get PKS Cluster Credentials of one of the failing clusters [pks get-credentials <clustername>]
Try to login using the above credentials [pks login]
If it's failing with error example "ERROR pks.PksClient [discover-clusters] Could not authenticate to the PKS API. Got response code 401"
Try to login using default TKGI admin user instead, and if it began to scrape data right away for all clusters, Then it will be issue on RBAC permission given to HealthWatch UAA Client
The required permission is pks.clusters.admin. Cluster Discovery requires the ability to login to each cluster so that it can create a read-only user on each cluster with access to the metrics endpoints.
For details on managing users see https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid-Integrated-Edition/1.14/tkgi/GUID-manage-users.html#uaa-user [uaac member add pks.clusters.admin healthwatch-tkgi-admin-read]
N.B pks login with healthwatch-tkgi-admin-read need to use --client-name and --client-secret instead of --username and --password as it is using a UAA client to login