[TKGs] Namespaces stuck in "Configuring" status due to missing Supervisor's Certificates/Issuers
search cancel

[TKGs] Namespaces stuck in "Configuring" status due to missing Supervisor's Certificates/Issuers

book

Article ID: 313103

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere with Tanzu

Issue/Introduction

  • The purpose of this article is to outline a possible workaround for the described issue.


Symptoms:
  • vSphere Client > Workload Management > Namespaces > Config Status shows "Configuring" for all namespaces.
  • Supervisor's kube-apiserver logs show errors complaining about expired certificates:

2023-11-09T08:44:45.117047081Z stderr F W1109 08:44:45.116971    1 dispatcher.go:191] Failed calling webhook, failing closed default.mutating.virtualmachine.vmoperator.vmware.com: failed calling webhook "default.mutating.virtualmachine.vmoperator.vmware.com": failed to call webhook: Post "https://vmware-system-vmop-webhook-service.vmware-system-vmop.svc:443/default-mutate-vmoperator-vmware-com-v1alpha1-virtualmachine?timeout=10s": x509: certificate has expired or is not yet valid: current time 2023-11-09T08:44:45Z is after 2023-11-05T09:12:20Z

2023-11-09T08:44:45.148148956Z stderr F E1109 08:44:45.148048    1 cacher.go:420] cacher (*unstructured.Unstructured): unexpected ListAndWatch error: failed to list cluster.x-k8s.io/v1alpha3, Kind=Machine: conversion webhook for cluster.x-k8s.io/v1beta1, Kind=Machine failed: Post "https://capi-webhook-service.vmware-system-capw.svc:443/convert?timeout=30s": x509: certificate has expired or is not yet valid: current time 2023-11-09T08:44:45Z is after 2023-11-05T09:12:18Z; reinitializing...

 

  • Supervisor is missing Certificate and Issuer objects. On a vCenter 7.0.3 build-22357613, Supervisor 1.25.6, this is the minimum standard list of Certificate and Issuer objects. Please compare the list in the troubled environment you're investigating with the list in a healthy one running the same vCenter and Supervisor versions.

# kubectl get certificate,issuer -A

NAMESPACE                  NAME                                   READY  SECRET                      AGE

vmware-system-appplatform-operator-system  certificate.cert-manager.io/vmware-system-psp-operator-serving-cert    True  vmware-system-psp-operator-webhook-service-cert  19d

vmware-system-capw             certificate.cert-manager.io/capi-kubeadm-bootstrap-serving-cert      True  capi-kubeadm-bootstrap-webhook-service-cert    19d

vmware-system-capw             certificate.cert-manager.io/capi-kubeadm-control-plane-serving-cert    True  capi-kubeadm-control-plane-webhook-service-cert  19d

vmware-system-capw             certificate.cert-manager.io/capi-serving-cert               True  capi-webhook-service-cert             19d

vmware-system-capw             certificate.cert-manager.io/capw-serving-cert               True  capw-webhook-service-cert             19d

vmware-system-license-operator       certificate.cert-manager.io/vmware-system-license-operator-serving-cert  True  webhook-server-cert                19d

vmware-system-nsop             certificate.cert-manager.io/vmware-system-nsop-serving-cert        True  webhook-server-cert                19d

vmware-system-tkg              certificate.cert-manager.io/vmware-system-tkg-serving-cert        True  vmware-system-tkg-webhook-service-cert      19d

vmware-system-vmop             certificate.cert-manager.io/vmware-system-vmop-serving-cert        True  webhook-server-cert                19d

 

NAMESPACE                  NAME                                   READY  AGE

vmware-system-appplatform-operator-system  issuer.cert-manager.io/vmware-system-psp-operator-selfsigned-issuer    True  19d

vmware-system-capw             issuer.cert-manager.io/capi-kubeadm-bootstrap-selfsigned-issuer      True  19d

vmware-system-capw             issuer.cert-manager.io/capi-kubeadm-control-plane-selfsigned-issuer    True  19d

vmware-system-capw             issuer.cert-manager.io/capi-selfsigned-issuer               True  19d

vmware-system-capw             issuer.cert-manager.io/capw-selfsigned-issuer               True  19d

vmware-system-license-operator       issuer.cert-manager.io/vmware-system-license-operator-selfsigned-issuer  True  19d

vmware-system-nsop             issuer.cert-manager.io/vmware-system-nsop-selfsigned-issuer        True  19d

vmware-system-tkg              issuer.cert-manager.io/vmware-system-tkg-selfsigned-issuer        True  19d

vmware-system-vmop             issuer.cert-manager.io/vmware-system-vmop-selfsigned-issuer        True  19d


In some cases the above error would be observed if the certificate has expired and deleting the secret associated with the certificate will rotate it. But only is the Issuer is present.


Environment

VMware vSphere 7.0 with Tanzu

Cause

  • It's undetermined the reason why Certificate and Issuer objects were missing in the Supervisor context.

Resolution

  • No resolution available at the moment since the root cause is not identified. Please proceed with the Workaround.


Workaround:
From a healthy environment running the same vCenter and Supervisor versions, copy all the missing Certificate and Issuer objects' definitions and recreate them in the troubled environment where they are missing:
  • Get the definition file for the missing Certificate/Issuer objects in the Supervisor context:​​​​​​​​​​​​​​
# kubectl get certificate <certificate-name> -n <namespace> -o yaml > <certificate-name>.yaml
# kubectl get issuer <issuer-name> -n <namespace> -o yaml > <issuer-name>.yaml
  • Copy the generated YAML definition files to the troubled environment's Supervisor ControlPlane node.
  • From the troubled environment's Supervisor ControlPlane node, apply the Certificates/Issuers YAML definition files:
# kubectl apply -f <certificate-name>.yaml
# kubectl apply -f <issuer-name>.yaml
  • ​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​List again all the Certificate/Issuer objects and make sure there's none missing compared to the healthy environment:
# kubectl get certificate,issuer -A
  • Restart vCenter's wcp service:
# service-control --status wcp
# service-control --restart wcp
# service-control --status wcp