You attempt to upgrade your production Supervisor Cluster to version v1.29.7, but the process fails during the vmoperatorupgrade stage and cannot proceed to namespaceoperatorupgrade. You see the following error in both compupgrade and upgrade.ctl.py output:
Error from server (InternalError): error when applying patch:{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{\"kubernetes.io/change-cause\":\"kubectl apply --filename=/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/30-vmop --record=true\"},\"labels\":{\"pod-security.kubernetes.io/enforce\":\"privileged\"},\"name\":\"vmware-system-vmop\"}}\n","kubernetes.io/change-cause":"kubectl apply --filename=/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/30-vmop --record=true"}}}to:Resource: "/v1, Resource=namespaces", GroupVersionKind: "/v1, Kind=Namespace"Name: "vmware-system-vmop", Namespace: ""for: "/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/30-vmop/vmop.yaml": error when patching "/usr/lib/vmware-wcp/objects/PodVM-GuestCluster/30-vmop/vmop.yaml": Internal error occurred: failed calling webhook "default.validating.namespace.supervisor.vmware.com": failed to call webhook: Post "https://vmware-system-nsop-webhook-service.vmware-system-nsop.svc:443/supervisor-namespace-validate-v1-namespace?timeout=30s": tls: failed to verify certificate: x509: certificate signed by unknown authorityComponent upgrade failed.
This issue occurs only in the production environment. You successfully complete the same upgrade process in non-production, even though it also shows self-signed certificate warnings when testing the NSOP webhook.
The upgrade fails because the NSOP webhook service presents a self-signed certificate that is not trusted by the upgrade process. Although the ValidatingWebhookConfiguration includes a caBundle and the certificate matches the expected keypair, the webhook client still rejects the TLS connection.
This rejection happens before the NSOP webhook pod is even reached. You also observe unexpected external hops in traceroute, which suggests the traffic may be routed incorrectly or intercepted. Non-production appears to tolerate the same self-signed cert, but production enforces stricter validation, blocking the upgrade.
Follow these steps to proceed with the upgrade:
kubectl edit validatingwebhookconfiguration -n vmware-system-nsop vmware-system-nsop-validating-webhook-configurationThis change bypasses the failing webhook during the upgrade process and allows it to complete. You can revert the webhook settings after the upgrade if needed.