We tried several different ways of restarting the Master node but kube-apiserver starts and fails immediately,
etcd seems to be fine but cannot find a way to bring the master to running state.
In the kube-api logs messages like below can be seen:
I0707 09:08:01.435396 6 healthz.go:261] poststarthook/rbac/bootstrap-roles check failed: readyz
[-]poststarthook/rbac/bootstrap-roles failed: not finished
I0707 09:08:01.443460 6 healthz.go:261] poststarthook/rbac/bootstrap-roles check failed: healthz
TKGI 1.2x
After investigation of kube-api logs the following snippet can be seen:
W0707 15:01:42.235278 7 dispatcher.go:217] Failed calling webhook, failing closed validate.kyverno.svc-fail: failed calling webhook "validate.kyverno.svc-fail": failed to call webhook: Post "https://kyverno-svc.ccp-kyverno.svc:443/validate/fail?timeout=10s": context deadline exceeded
I0707 15:01:42.235535 7 trace.go:236] Trace[708539055]: "Create" accept:application/vnd.kubernetes.protobuf, */*,audit-id:0xxxxxxxxxxxxxxxxd080,client:127.0.0.1,api-group:rbac.authorization.k8s.io,api-version:v1,name:,subresource:,namespace:,protocol:HTTP/2.0,resource:clusterrolebindings,scope:resource,url:/apis/rbac.authorization.k8s.io/v1/clusterrolebindings,user-agent:kube-apiserver/v1.29.6+vmware.1 (linux/amd64) kubernetes/73fc294,verb:POST (07-Jul-2025 15:01:32.233) (total time: 10001ms):
Trace[708539055]: ["Call validating webhook" configuration:kyverno-resource-validating-webhook-cfg,webhook:validate.kyverno.svc-fail,resource:rbac.authorization.k8s.io/v1, Resource=clusterrolebindings,subresource:,operation:CREATE,UID:f473axxxxxxxxxxxcdcc 10000ms (15:01:32.234)]
Note kyverno is 3rd party policy tool
It looks the related webhook is not in running state and because of the failure policy set to fail it prevents the healthcheck from completion
To bypass the problem,
Export the webhook into yaml
Delete the webhook
Confirm after starting the kube-api service if healtheck passes
kubectl get --raw /healthz?verbose
kubectl get --raw /healthz?verbose
[+]ping ok
[+]log ok
[+]etcd ok
...
[+]poststarthook/start-service-ip-repair-controllers ok
[+]poststarthook/rbac/bootstrap-roles ok
....
[+]poststarthook/apiservice-openapiv3-controller ok
[+]poststarthook/apiservice-discovery-controller ok
There could be potentially other issues related to the kube-api healthcheck in such situations reach out for support for diagnosis.