In Kubernetes clusters, you may encounter repeated warnings in the kube-apiserver logs indicating failed webhook calls. These errors can vary, including messages like:
An example log message might look like:
Failed calling webhook, failing open validation.gatekeeper.sh: failed calling webhook "validation.gatekeeper.sh": failed to call webhook: Post "https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admit?timeout=3s"
These errors can lead to high memory usage on master node VMs, with utilization nearing 100%, and related symptoms such as swapping, high CPU usage related to disk iowait.
Additionally, etcd logs may display warnings such as:
apply request took too long
, with extended durations beyond expected limits.
The errors in the kube-apiserver logs indicate that webhook calls are failing, often due to overloaded or unresponsive webhook services. These accumulated failed calls can contribute to increased memory usage and degrade cluster performance, particularly when associated with a specific service URL (e.g., https://gatekeeper-webhook-service.gatekeeper-system.svc:443/ in this example).