https://registry.tkg.vmware.run/v2/pause/manifests/3.2": x509: certificate has expired or is not yet valid
search cancel

https://registry.tkg.vmware.run/v2/pause/manifests/3.2": x509: certificate has expired or is not yet valid

book

Article ID: 369548

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Integrated (TKGi)

Issue/Introduction

This error message can occur when images previously pulled from the deprecated 'https://registry.tkg.vmware.run/' registry are no longer available on a node. On TKGi, these images are built into the tile and do not need to be pulled externally. Since this repository is no longer in use, there are no plans to renew its certificate.

You may notice that pods are failing to schedule on a specific node and see events similar to the following snippet:

Events:
  Type     Reason                  Age                  From               Message
  ----     ------                  ----                 ----               -------
  Warning  FailedCreatePodSandBox  5m58s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "registry.tkg.vmware.run/pause:3.2": failed to pull image "registry.tkg.vmware.run/pause:3.2": failed to pull and unpack image "registry.tkg.vmware.run/pause:3.2": failed to resolve reference "registry.tkg.vmware.run/pause:3.2": failed to do request: Head "https://registry.tkg.vmware.run/v2/pause/manifests/3.2": x509: certificate has expired or is not yet valid: current time 2024-04-04T07:53:46Z is after 2024-03-26T23:59:59Z

Or a pod error message similar to: 

"CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.9\": failed to pull image \"registry.k8s.io/pause:3.9\": failed to pull and unpack image \"registry.k8s.io/pause:3.9\": failed to resolve reference \"registry.k8s.io/pause:3.9\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.9\": dial tcp xxx.xxx.xxx.xxx:443: i/o timeout" pod="ns-xxxxxxxx"

Environment

TKGi v1.18.4 or lower

TKGi v1.19.1 or lower

If you are using version of TKGi higher than those listed above (1.18.5, or 1.19.2+), image pinning should be working as expected and this KB is not needed. Please review the following KB to ensure that pause images are properly pinned: https://knowledge.broadcom.com/external/article/380820

Cause

The images are missing due to manual deletion or removal during a node's disk pressure event. 

Note: The containerd that ships with the TKGi versions listed in the "Environment" section of this article had a bug that prevented the pause images from being properly pinned, making them vulnerable to garbage collection. 

Resolution

As a permanent fix, upgrade to TKGi 1.19.2+ where pause images are exempt from garbage collection. 

On versions lower than TKGi 1.19.2, follow the below steps to download the pause images onto the node: 

  1. SSH to the problematic node
    • bosh -d <service-instance-ID> ssh <problematic-node>
  2. Switch to root user
    • sudo -i
  3. Execute the following script
    • /var/vcap/jobs/load-images/bin/post-start

After the above steps are followed, the images will be available once again. It is important to note that if another disk pressure event occurs, the script will need to be executed again to resolve the issue.