HealthWatch v2.1.5 Scrape Config files failed to be generated for TKGi Clusters
search cancel

HealthWatch v2.1.5 Scrape Config files failed to be generated for TKGi Clusters

book

Article ID: 313105

calendar_today

Updated On:

Products

VMware

Issue/Introduction

Symptoms:
pks-cluster-discovery for clusters is not working and scrape config file is not generated as per error below from pks-cluster-discovery stdout log file

Error Example from pks-cluster-discovery.stdout.log

[1m2021-10-25 05:54:15ESC [__meta_kubernetes_pod_name], action=replace, regex=, targetLabel=instance, replacement=)], dnsSdConfig=[], staticConfigs=[])]ESC[1m2021-10-25 05:54:15ESC[m ESC[32mINFOESC[m ESC[2mpks.ScrapeConfigGeneratorESC[m [discover-clusters] Could not get scrape config for cluster xxxxxxxxxxxxxxxxx



Cause


Scrape Config files are collected when TKGI clusters connect to the Kubernetes API through the TKGI API using a UAA client. The UAA Client for HealthWatch is generated when UAA is enabled as the OIDC provider for TKGI 

Failing on the pks get-credentials <clustername> command, and pks login with the same user TKGI Cluster Discovery for HealthWatch failed to login to the clusters due to RBAC missing permissions. 

Resolution

Scrape Config files are collected when TKGI clusters connect to the Kubernetes API through the TKGI API using a UAA client. The UAA Client for HealthWatch is generated when UAA enabled as the OIDC provider for TKGI 

UAA dedicated client called 'healthwatch-tkgi-admin-read' created in UAA for use in Healthwatch tile

TKGI has some RBAC rules in it where only an admin can access all clusters. Otherwise you can only access the clusters you created. The required permission is pks.clusters.admin. Cluster Discovery requires the ability to login to each cluster so that it can create a read-only user on each cluster with access to the metrics endpoints.

In order to verify if it's RBAC issue, follow the steps below
  • Check the status of the service [monit status pks-cluster-discovery]
  • Ensure Service is running or try restarting the service if it's not running
  • Get PKS Cluster Credentials of one of the failing clusters [pks get-credentials <clustername>]
  • Try to login using the above credentials [pks login
  • If it's failing with error example "ERROR pks.PksClient [discover-clusters] Could not authenticate to the PKS API.  Got response code 401"
  • Try to login using default TKGI admin user instead, and if it began to scrape data right away for all clusters, Then it will be issue on RBAC permission given to HealthWatch UAA Client 
  • The required permission is pks.clusters.admin. Cluster Discovery requires the ability to login to each cluster so that it can create a read-only user on each cluster with access to the metrics endpoints.
  • For details on managing users see https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid-Integrated-Edition/1.14/tkgi/GUID-manage-users.html#uaa-user [uaac member add pks.clusters.admin healthwatch-tkgi-admin-read


N.B 
pks login with healthwatch-tkgi-admin-read need to use --client-name and --client-secret instead of --username and --password as it is using a UAA client to login


Additional Information

Reference Document for Configuring PKS Cluster Discovery Service on TKGi 
Link: https://docs.pivotal.io/healthwatch/2-1/configuring/optional-config/configuring-cluster-discovery.html#configure-tkgi-cluster-discovery