Supervisor upgrade fails while configuring TMC component with "tls: failed to verify certificate: x509" error
search cancel

Supervisor upgrade fails while configuring TMC component with "tls: failed to verify certificate: x509" error

book

Article ID: 413656

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

We see the following error on the Supervisor.

ImageController configured
SecretgenController configured
LicenseOperatorController configured
EnvProps configured
ImageRegistry configured
CsiController configured
TMC configured
Configuration error (since DD/MM/YYYY, HH:MM:SS AM)
Component Configuration error: Component TMCUpgrade failed: Failed to run command: ['kubectl', 'apply', '-f', '/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/70-tmc-agent-installer', '--record'] ret=1 out=customresourcedefinition.apiextensions.k8s.io/agentconfigs.installers.tmc.cloud.vmware.com configured customresourcedefinition.apiextensions.k8s.io/agentinstalls.installers.tmc.cloud.vmware.com configured clusterrole.rbac.authorization.k8s.io/tmc-agent-installer-role configured clusterrolebinding.rbac.authorization.k8s.io/tmc-agent-installer-rolebinding configured err=Flag --record has been deprecated, --record will be removed in the future Error from server (InternalError): error when creating "/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/70-tmc-agent-installer/tmc-agent-installer.yaml": Internal error occurred: failed calling webhook "default.validating.namespace.supervisor.vmware.com": failed to call webhook: Post "https://vmware-system-nsop-webhook-service.vmware-system-nsop.svc:443/supervisor-namespace-validate-v1-namespace?timeout=30s": tls: failed to verify certificate: x509: certificate signed by unknown authority Error from server (NotFound): error when creating "/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/70-tmc-agent-installer/tmc-agent-installer.yaml": namespaces "svc-tmc-cX" not found Error from server (NotFound): error when creating "/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/70-tmc-agent-installer/tmc-agent-installer.yaml": namespaces "svc-tmc-cX" not found Error from server (NotFound): error when creating "/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/70-tmc-agent-installer/tmc-agent-installer.yaml": namespaces "svc-tmc-cX" not found Error from server (NotFound): error when creating "/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/70-tmc-agent-installer/tmc-agent-installer.yaml": namespaces "svc-tmc-cX" not found Error from server (NotFound): error when creating "/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/70-tmc-agent-installer/tmc-agent-installer.yaml": namespaces "svc-tmc-cX" not found


Upgrade status showed as failed when checked through upgrade-ctl.py command.


# /usr/lib/vmware-wcp/upgrade/upgrade-ctl.py get-status | jq '.progress | to_entries | .[] | "\(.value.status) - \(.key)"' | sort
"failed - TMCUpgrade"


All the TLS certificates are valid when checked using the command below.

# find /etc -type f \( -name "*.cert" -o -name "*.crt" \)  | xargs -L 1 -t -i bash -c 'openssl x509 -noout -text -in {}|grep After'

This issue has been observed in upgrades from 1.27 to 1.28 as well as from 1.28 to 1.29.

Environment

VMware vSphere Kubernetes Service

Tanzu Mission Control (TMC)

Cause

The cert-manager generated an invalid TLS certificate for the service.

Resolution

In many cases, this issue can be resolved with a simple restart of the supervisor upgrade, by navigating to the updates tab for the supervisor in Workload Management, selecting the target version, and clicking "apply."

If the above troubleshooting step results in the same error, the following steps can be taken:

1. SSH to the Supervisor and run the commands below to restart the deployments in the vmware-system-cert-manager namespace.

kubectl rollout restart deployment -n vmware-system-cert-manager cert-manager-cainjector
kubectl rollout restart deployment -n vmware-system-cert-manager cert-manager
kubectl rollout restart deployment -n vmware-system-cert-manager cert-manager-webhook

2. Check if all the pods are running.

kubectl get pods -n vmware-system-cert-manager

3. Restart the WCP service from vCenter SSH

service-control --restart wcp

4. Initiate the Supervisor upgrade again.