If you have deployed the Wavefront Observability for Kubernetes Operator manually on a TKGI cluster, it may get deleted when updating/upgrading the cluster.
This issue is known to happen with the following configuration:
TKGI v1.20.0 or later
When you update or upgrade TKGI clusters, it runs the wavefront-proxy-errand. This errand deletes the observability-system namespace if Wavefront Integration is disabled in the TKGI tile settings
One solution is to enable the TKGI Wavefront integration in the tile settings instead of deploying the Wavefront operator yourself. Please note the TKGI Wavefront Integration setting is deprecated and will be removed in a future release
Another solution is to use the following runtime config to patch the wavefront-proxy-errand so it doesn't delete the observability-system namespace:
bosh upload-release "https://bosh.io/d/github.com/cloudfoundry/os-conf-release?v=23.0.0"
addons:
- name: wavefront-errand-config
jobs:
- name: pre-start-script
release: os-conf
properties:
script: |-
#!/bin/bash
if [ -f /var/vcap/jobs/wavefront-proxy-errand/bin/run ]; then
sed -i '/delete_spec "wavefront.yml"/d' /var/vcap/jobs/wavefront-proxy-errand/bin/run
sed -i '/delete_spec "wavefront-proxy.yml"/d' /var/vcap/jobs/wavefront-proxy-errand/bin/run
sed -i '/delete_spec "wavefront-operator.yml"/d' /var/vcap/jobs/wavefront-proxy-errand/bin/run
sed -i '/${kubectl} delete secret wavefront-secret --namespace=${new_namespace} --ignore-not-found/s/^/# /' /var/vcap/jobs/wavefront-proxy-errand/bin/run
fi
include:
instance_groups: [apply-addons]
releases:
- name: "os-conf"
version: "((OS_CONF_RELEASE))"
releases:
- name: "os-conf"
version: "23.0.0"
Upload the runtime config to bosh
bosh update-runtime-config --name wavefront-errand-config ./wavefront-errand-config.yml
Try your cluster update/upgrade again and verify wavefront deployment is not deleted
If you encounter bosh errors indicating "Error: - Failed to find variable '/p-bosh/service-instance_<ID>/OS_CONF_RELEASE' from config server: HTTPCode '404'", please see the note in step 2 of the resolution section and ensure you are using the correct os-conf version instead of OS_CONF_RELEASE