PVC or VolumeSnapshot operations fail on Supervisor cluster due to expired CA certificate
search cancel

PVC or VolumeSnapshot operations fail on Supervisor cluster due to expired CA certificate

book

Article ID: 422493

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

On a Supervisor cluster, you may intermittently fail to create or update PVC or VolumeSnapshot objects. This failure typically occurs during the storage quota validation phase.

You will see an error message similar to the following:

 
Error from server (Forbidden): error when creating "pvc.yaml": admission webhook "validate-quota-on-create.k8s.io" denied the request: Operation denied, Post "https://cns-vsphere-vmware-com-service.kube-system.svc.cluster.local:443/getrequestedcapacityforpersistentvolumeclaim": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "storage-quota-selfsigned-issuer-cert")

Environment

vCenter Server: 9.0, 9.0.1, 9.0.2

Cause

When the Supervisor cluster processes storage quota validation for CREATE or UPDATE requests, the storage quota webhook communicates with the CNS extension service via an mTLS connection. This connection relies on client-server certificates signed by a common custom Certificate Authority (CA) managed by cert-manager.

When the CA certificates are auto-renewed upon expiry, cert-manager does not automatically refresh the child client-server certificates signed by the old CA. Consequently, the storage quota webhook and CNS extension service pods continue using stale certificate data, leading to TLS verification failures.

Resolution

To resolve this issue, you must manually delete the expired secret data to trigger certificate regeneration and restart the affected pods to load the new certificates.

  1. Access the vCenter Appliance: Log in to the vCenter Server Appliance via SSH as root. ssh root@<vcenter-fqdn-or-ip>

  2. Retrieve Supervisor Control Plane Credentials: Run the following script to obtain the IP address and password for the Supervisor control plane: /usr/lib/vmware-wcp/decryptK8Pwd.py

  3. Log in to the Supervisor Control Plane: Use the credentials from the previous step to SSH into the Supervisor: ssh root@<supervisor-ip>

  4. Refresh the Webhook Certificate: Delete the existing secret for the storage quota webhook to force a refresh with the new CA: kubectl delete secret -n kube-system storage-quota-webhook-server-internal-cert

  5. Refresh the CNS Extension Certificate: Delete the secret for the CNS extension service: kubectl delete secret -n kube-system cns-storage-quota-extension-cert

  6. Restart the Storage Quota Webhook Pods: Scale the deployment down to zero and then back to its original count (typically 3) to reload the certificates: kubectl -n kube-system scale deploy storage-quota-webhook --replicas=0 kubectl -n kube-system scale deploy storage-quota-webhook --replicas=3

  7. Restart the CNS Storage Quota Extension Pods: Scale the extension deployment down to zero and then back to 1: kubectl -n kube-system scale deploy cns-storage-quota-extension --replicas=0 kubectl -n kube-system scale deploy cns-storage-quota-extension --replicas=1