While running a Bosh deployment, such as during a cluster upgrade or other deployment update, the apply-addons job returns an exit code 1 error.
When looking at the bosh task debug output you see the following:
EXAMPLE from:
bosh task XXXX --debugFrom the:
"result_output"
You see:
"errand_name":"apply-addons","exit_code":1
and:
Internal error occurred: failed calling webhook
and:
service \"<GATEKEEPER_WEBHOOK_SERVICE_NAME>\" not found"
And if you run the apply-addon errand manually while keeping the apply-addon errand VM you will see a more human-readable error output similar to below:
EXAMPLE COMMAND: to keep the apply-addons errand after running:
bosh -d CLUSTER_SERVICE_INSTANCE run-errand apply-addons --keep-aliveOUTPUT:
Instance apply-addons/<BOSH_INSTANCE_ID>
Exit Code 1
Stdout
failed to start all system specs after 1200 with exit code 1
Stderr Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
Error from server (InternalError): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"name\":\"pks-system\"}}\n"},"labels":null}}
to:
Resource: "/v1, Resource=namespaces", GroupVersionKind: "/v1, Kind=Namespace"
Name: "pks-system", Namespace: ""
for: "/var/vcap/jobs/apply-specs/specs/addon-spec.yml": Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": Post https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s: service "gatekeeper-webhook-service" not found
CAUSE: This can happen in instances such as removing Gatekeeper from an internal Concourse Pipeline but the pipeline does not remove additional Gatekeeper objects, such as:
A PodSecurityPolicy for Gatekeeper: example: gatekeeper-admin
A validatingwebhookconfiguration for Gatekeeper
A clusterrole for Gatekeeper
A clusterrolebinding for Gatekeeper