Inconsistent Proxy Settings and Trusted Certificates for Private Container Registries in Supervisor Cluster Causing imgpkg Failures and TKC Update Issues
search cancel

Inconsistent Proxy Settings and Trusted Certificates for Private Container Registries in Supervisor Cluster Causing imgpkg Failures and TKC Update Issues

book

Article ID: 390741

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • The proxy settings defined in Supervisor Cluster are not using the correct proxy, no proxy at all and do show errors when accessing resources requiring a working proxy
  • When using a private container registry with HTTPS, the symptom may include imgpkg bundle resolutions/pull failures. This may lead to TKC Service update failures.
  • When verifying the secret "kapp-controller-config" on the Supervisor cluster, the values are constantly changing between different values every about 20 seconds. And when verifying the resourceVersion of the secret, its number is continuously increasing. Increasing value can observed by verifying it via:
    # kubectl get secret -n vmware-system-appplatform-operator-system kapp-controller-config -o jsonpath="{.metadata.resourceVersion}" -w

    (Note: Running this kubectl command requires connecting to one of the Supervisor Control Plane node via root. More information available here.)

Environment

VMware vSphere 8.0 Update 3 and later
Tanzu Kubernetes Grid Service

Cause

Each Control Plane node in the Supervisor cluster does have their own local service which is applying configuration pushed from vCenter's WCP service. This includes the respective proxy configuration set in the vSphere UI for the Supervisor clusters. However, the service can potentially hold stale values from the configuration in its cache.

When the individual nodes are started or restarted, the values between the nodes do differ and hence each node is writing its own, known configuration in the shared kapp-controller-config secret. Overwriting the value from a different node, hence again causing a mismatch when the next node is verifying the same.

Resolution

Engineering is currently working in addressing this in a future release.

Workaround

A workaround is restarting the responsible service on the Supervisor Control Plane nodes sequentially. This will make sure each service does read and have the latest correct values and prevent overwriting the previous settings. This restart must be applied every time the proxy settings or private container registry settings are modified on the Supervisor Cluster.

To apply the workaround, please restart the service as root on each of the three Supervisor Control Plane nodes:

systemctl restart wcp-sync