VKS Upgrade Fails with TLS Certificate Error for projects.packages.broadcom.com Due to Kubernetes DNS Search Domains
search cancel

VKS Upgrade Fails with TLS Certificate Error for projects.packages.broadcom.com Due to Kubernetes DNS Search Domains

book

Article ID: 424075

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • During an upgrade of vSphere Kubernetes Service (VKS) to version 3.5.0, the operation may fail during the reconciliation phase when components attempt to pull packages from the Broadcom registry.
  • The failure is typically reported by kapp-controller when accessing the Broadcom packages registry.

Error while preparing a transport to talk with the registry:
Unable to create round tripper:
Get "https://projects.packages.broadcom.com/v2/":
tls: failed to verify certificate:
x509: certificate is valid for *.domain.local, not projects.packages.broadcom.com

  • As the registry cannot be accessed successfully, package installations required for the upgrade remain in a reconciling or failed state, preventing the VKS upgrade from completing.
  • Below symptoms are also observed:

    • TLS verification failures in kapp-controller logs when accessing https://projects.packages.broadcom.com.

    • DNS resolution differences between Supervisor nodes and pods inside the cluster.

    • Registry connectivity tests from Supervisor nodes succeeds, while the same tests inside the kapp-controller pod fail.

Environment

  • VMware vSphere Kubernetes Service

Cause

  • The issue occurs due to DNS search domain behavior inside Kubernetes pods, combined with the default resolver configuration (ndots:5).
  • The Supervisor Management Network DNS configuration includes a search domain such as:

domain.local
  • The internal DNS server contains a record similar to:

projects.packages.broadcom.com.domain.local
  • This internal hostname resolves to a private IP address and presents a TLS certificate for:
*.domain.local
  • Inside Kubernetes pods (such as kapp-controller), the DNS resolver behavior works as follows:
    • Due to the ndots:5 setting, the resolver treats projects.packages.broadcom.com as a relative name.
    • The resolver first attempts to resolve the hostname using configured search domains, resulting in:

projects.packages.broadcom.com.domain.local

    • Since this name exists in the internal DNS zone, the lookup succeeds and returns the private endpoint.

    • The pod connects to this endpoint and receives a certificate for *.domain.local.

    • TLS verification fails because the requested hostname is:

projects.packages.broadcom.com

  • Meanwhile, tests performed directly from the Supervisor nodes resolve the correct public address for projects.packages.broadcom.com and receive the proper public TLS certificate, which is why those checks succeed.

Resolution

To restore proper registry access and allow the upgrade to proceed, adjust the DNS configuration so that Kubernetes pods resolve the public registry endpoint correctly.

  1. Temporarily Remove the Conflicting DNS Search Domain
    • Update the Supervisor Management Network DNS configuration and temporarily remove the search domain that causes the collision (for example):
                domain.local
    • Removing the search domain prevents the resolver from attempting to resolve:

                projects.packages.broadcom.com.domain.local
  1.  
  2. Restart Affected System Components
    • After updating the DNS configuration, restart the components that rely on DNS resolution, so they reload the updated resolver configuration.
    • Restart the following pods:

      • kapp-controller

      • coredns

      • image-controller

    • This can be done by deleting the pods in their respective namespaces and allowing Kubernetes to recreate them.

  3. Verify DNS Resolution and TLS Connectivity
    • From inside the kapp-controller pod, verify that the registry hostname resolves correctly:
      • kubectl exec -it <kapp-controller-pod> -n <namespace> -- nslookup projects.packages.broadcom.com
    • Confirm that:

      • The hostname resolves to the public IP address.

      • TLS verification succeeds when accessing the registry endpoint.

  4. Confirm Successful Reconciliation
    • After DNS resolution is corrected:

      • Package installations should reconcile successfully.

      • The VKS upgrade process should continue and complete without registry access errors.

    • Verify the status of package installations and cluster components to confirm successful reconciliation.

  5. Restore the Original DNS Search Domain
    • After the upgrade completes and registry connectivity is verified, the previously removed DNS search domain (for example, domain.local) can be restored in the Supervisor Management Network DNS configuration if required for internal name resolution.

Additional Information

  • When troubleshooting TLS hostname mismatch errors involving public services:
    • Compare DNS resolution results from nodes and pods.

    • Verify both the IP address and TLS certificate presented by the endpoint.

    • Check for internal DNS records that shadow public hostnames.

  • In environments using Kubernetes, the combination of DNS search domains and the ndots resolver behavior can cause internal records to override public DNS entries, leading to unexpected connectivity or TLS verification issues.