Supervisor Upgrade to v1.29.7 Fails Due to NSOP Webhook TLS Validation Error
search cancel

Supervisor Upgrade to v1.29.7 Fails Due to NSOP Webhook TLS Validation Error

book

Article ID: 406724

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime

Issue/Introduction

You attempt to upgrade your production Supervisor Cluster to version v1.29.7, but the process fails during the vmoperatorupgrade stage and cannot proceed to namespaceoperatorupgrade. You see the following error in both compupgrade and upgrade.ctl.py output:

Error from server (InternalError): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{\"kubernetes.io/change-cause\":\"kubectl apply --filename=/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/30-vmop --record=true\"},\"labels\":{\"pod-security.kubernetes.io/enforce\":\"privileged\"},\"name\":\"vmware-system-vmop\"}}\n","kubernetes.io/change-cause":"kubectl apply --filename=/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/30-vmop --record=true"}}}
to:
Resource: "/v1, Resource=namespaces", GroupVersionKind: "/v1, Kind=Namespace"
Name: "vmware-system-vmop", Namespace: ""
for: "/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/30-vmop/vmop.yaml": error when patching "/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/30-vmop/vmop.yaml": Internal error occurred: failed calling webhook "default.validating.namespace.supervisor.vmware.com": failed to call webhook: Post "https://vmware-system-nsop-webhook-service.vmware-system-nsop.svc:443/supervisor-namespace-validate-v1-namespace?timeout=30s": tls: failed to verify certificate: x509: certificate signed by unknown authority
Component upgrade failed.

This issue occurs only in the production environment. You successfully complete the same upgrade process in non-production, even though it also shows self-signed certificate warnings when testing the NSOP webhook.

 

Cause

The upgrade fails because the NSOP webhook service presents a self-signed certificate that is not trusted by the upgrade process. Although the ValidatingWebhookConfiguration includes a caBundle and the certificate matches the expected keypair, the webhook client still rejects the TLS connection.

This rejection happens before the NSOP webhook pod is even reached. You also observe unexpected external hops in traceroute, which suggests the traffic may be routed incorrectly or intercepted. Non-production appears to tolerate the same self-signed cert, but production enforces stricter validation, blocking the upgrade.

Resolution

Follow these steps to proceed with the upgrade:

  1. Restart the Supervisor upgrade explicitly from the vSphere UI. Restarting wcpsvc alone does not resume the upgrade.
  2. If the upgrade fails again with the same error, edit the vmware-system-nsop-validating-webhook-configuration and change the failurePolicy to Ignore.
    • kubectl edit validatingwebhookconfiguration -n vmware-system-nsop vmware-system-nsop-validating-webhook-configuration
  3. Save the configuration and restart the upgrade again via the UI.

This change bypasses the failing webhook during the upgrade process and allows it to complete. You can revert the webhook settings after the upgrade if needed.