tanzu management cluster fails: "Internal error occurred: error resolving resource"
search cancel

tanzu management cluster fails: "Internal error occurred: error resolving resource"

book

Article ID: 376718

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Management Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid 1.x VMware Tanzu Kubernetes Grid Plus VMware Tanzu Kubernetes Grid Plus 1.x

Issue/Introduction

Issue Summary:

In this scenario, an upgrade was performed on a TKG Management Cluster from version 2.3.1 to 2.4.1.

NOTE: Similar Management Cluster upgrade failures were reported in older TKG versions as well.

 

Errors: 

The main error:

  • tanzu mc upgrade:  management cluster upgrade fails with the below error:

Error: upgrade version compatibility validation failed: unable to get tkg version of management cluster "CLUSTER_NAME" in namespace "": unable to get the cluster object: Internal error occurred: error resolving resource

 

Other errors you may also see:

  • kube-apiserver log may show:

unable to load root certificates: unable to parse bytes as PEM block

 

  • cert-manager-cainjector pod may show:

cert-manager/secret-for-certificate-mapper "msg"="unable to fetch certificate that owns the secret" "error"="Certificate.cert-manager.io \"capi-serving-cert\" not found" "certificate"={"Namespace":"capi-system","Name":"capi-serving-cert"} "secret"={"Namespace":"capi-system","Name":"capi-webhook-service-cert"}

 

  • ako-operator may not be able to communicate with the kube-apiserver and may report:

"error"="Certificate.cert-manager.io \"capi-serving-cert\" not found"

 

 

Validation:

Run the following commands against your Management Cluster to verify if this is the same issue.

If you do not show the following Issuer and Certificate, then you likely have the same issue.

  • Make sure you are in the Management Cluster context

kubectl config use-context MANAGEMENT_CLUSTER_CONTEXT

  • Check if your Cluster API Certificate, capi-serving-cert exists in the capi-system Namespace:

kubectl get certificate -n capi-system

You should see at least the following certificate:

NAMESPACE                           NAME                                      READY   SECRET                                            AGE
capi-system                         capi-serving-cert                         True    capi-webhook-service-cert                         34m

 

  • Check if your Cluster API Issuer, capi-selfsigned-issuer exists in the capi-system Namespace:

kubectl get issuer -n capi-system

You should see at least the following certificate:

NAMESPACE                           NAME                                           READY   AGE
capi-system                         capi-selfsigned-issuer                         True    39m

Environment

Tanzu Kubernetes Grid (TKG): 2.3.1

Tanzu Kubernetes Grid (TKG): 2.4.1

Cause

Summary:

The failure results when the capi-serving-cert Certificate in the capi-system Namespace goes missing.

 

Details:

The tanzu CLI calls the clusterctl API (open source API).  It then performs a clusterctl upgrade

This is an upstream component that takes care of upgrading the version of the Cluster API providers (CRDs, controllers) installed into a management cluster.

Then tanzu CLI waits for it to succeed.  This is where the failure occurs.

Although the clusterctl upgrade is designed to be "idempotent", it is possible that it is unable to recover in this scenario.

The capi-system/capi-serving-cert is already missing, which is not part of the clusterctl API design.

This requires networking in the cluster to be working during the upgrade. 

 

Given this, the actual cause of the missing Certificate and Issuer is not clear.  It may be a symptom of a infrastructure or network failure occurring during the clusterctl upgrade.  

Resolution

Solution:
 
You will need to have the Cluster API Issuer and Certificate recreated manually.

Open a Tanzu Support request case.  A Tanzu Engineer will assess your system further before applying the Issuer and Certificate manifest to back to your Cluster API.

 

Steps:

  • Capture the relevant errors mentioned above as well as any others you see

  • Create a tanzu diagnostics bundle from your Management Cluster 

  • Open a newTanzu Support request case

  • Provide the above in the new case
  • Reference this KB ID: 376718
    • The solution manifest has been provided internally to this KB

    • It is important that the Tanzu Engineer gathers data for a fix as well as reconfirming there are no other issues first

  • After resolving the missing Issuer and Certificate, you will be able to continue with the upgrade