Customers found that once the TMC SM is deployed, alert of agent_gateway_liveness_critical
and api_gateway_liveness_critical
are always triggered, which are fake alerts and annoying. Customers want to close such alert.
TMC SM 1.3
alert rule of agent_gateway_liveness_critical is as sum by (namespace) ((avg_over_time(olympus_build_info{service="agent-gateway-service"}[5m])) or up * 0 ) <= 1. In the TMC SM env, prometheus has metric up without namespace as its label, so this exp will hit 0 value, so it triggers alert.
1. Create ytt overlay secret which is to add annotation for the packageinstall tmc-local-stack
a. kubectl apply -f stack-overlay.yaml
stack-overlay.yaml
apiVersion: v1
kind: Secret
metadata:
name: stack-overlay
namespace: tmc-local
stringData:
tmc-pkgi-overlay.yml: |
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"apiVersion":"packaging.carvel.dev/v1alpha1", "kind":"PackageInstall", "metadata": {"name": "tmc-local-stack"}}),expects="1+"
---
metadata:
#@overlay/match missing_ok=True
annotations:
#@overlay/match missing_ok=True
ext.packaging.carvel.dev/ytt-paths-from-secret-name.0: alert-overlay
2. Create ytt overly secret which is to update prometheus alert rule
a. kubectl apply-f alert-overlay.yaml
alert-overlay.yaml
apiVersion: v1
kind: Secret
metadata:
name: alert-overlay
namespace: tmc-local
stringData:
alert-overlay.yml: |
#@ load("@ytt:overlay", "overlay")
#@ load("@ytt:yaml", "yaml")
#@ def remove_or_up(expr):
#@ return expr.replace(' or up * 0', '')
#@ end
#@overlay/match by=overlay.subset({"kind":"ConfigMap", "metadata": {"name": "prometheus-alerts"}}),expects="1+"
---
data:
#@overlay/replace via=lambda left, _: remove_or_up(left)
core-alerts-api-gateway-agent-alerts.yaml:
#@overlay/replace via=lambda left, _: remove_or_up(left)
core-alerts-api-gateway-user-alerts.yaml:
3. Patch PackageInstall tanzu-mission-control with the extension annotation:
a. kubectl patch pkgi tanzu-mission-control --type='merge' -p '{"metadata": {"annotations": {"ext.packaging.carvel.dev/ytt-paths-from-secret-name.0": "stack-overlay"}}}' -n tmc-local. tanzu package installed kick tanzu-mission-control -y
4. kick the packageinstall
a.tanzu package installed kick tanzu-mission-control -y
tanzu package installed kick tanzu-mission-control -y
Triggering reconciliation for package install 'tanzu-mission-control' in namespace 'tmc-local'
7:14:15AM: Pausing reconciliation for package installation 'tanzu-mission-control' in namespace 'tmc-local'
7:14:17AM: Starting reconciliation for package install 'tanzu-mission-control' in namespace 'tmc-local'
7:14:17AM: Waiting for PackageInstall reconciliation for 'tanzu-mission-control'
7:14:18AM: Waiting for generation 6 to be observed
7:14:18AM: Fetch started
7:14:18AM: Fetching
| apiVersion: vendir.k14s.io/v1alpha1
| directories:
| - contents:
| - imgpkgBundle:
| image: harbor.tanzu.io:8443/tmc/package-repository@sha256:2e89ebe16a771480d3770b402c2a3273a70be44049653007b2381f1ac9b1cd00
| path: .
| path: "0"
| kind: LockConfig
|
7:14:18AM: Fetch succeeded
7:14:19AM: Template succeeded
7:14:19AM: Deploy started (2s ago)
7:14:21AM: Deploying
| Target cluster 'https://100.64.0.1:443' (nodes: wc-tmc-4f7ss-szvll, 3+)
| Changes
| Namespace Name Kind Age Op Op st. Wait to Rs Ri
| tmc-local tmc-local-stack PackageInstall 1d - - reconcile ongoing Reconciling
| Op: 0 create, 0 delete, 0 update, 1 noop, 0 exists
| Wait to: 1 reconcile, 0 delete, 0 noop
| 7:14:23AM: ---- applying 1 changes [0/1 done] ----
| 7:14:23AM: noop packageinstall/tmc-local-stack (packaging.carvel.dev/v1alpha1) namespace: tmc-local
| 7:14:23AM: ---- waiting on 1 changes [0/1 done] ----
| 7:14:23AM: ongoing: reconcile packageinstall/tmc-local-stack (packaging.carvel.dev/v1alpha1) namespace: tmc-local
| 7:14:23AM: ^ Reconciling
| 7:14:32AM: ok: reconcile packageinstall/tmc-local-stack (packaging.carvel.dev/v1alpha1) namespace: tmc-local
| 7:14:32AM: ---- applying complete [1/1 done] ----
| 7:14:32AM: ---- waiting complete [1/1 done] ----
| Succeeded
7:14:32AM: Deploy succeeded
b.tanzu package installed kick tmc-local-stack -y
tanzu package installed kick tmc-local-stack -y
Triggering reconciliation for package install 'tmc-local-stack' in namespace 'tmc-local'
7:14:39AM: Pausing reconciliation for package installation 'tmc-local-stack' in namespace 'tmc-local'
7:14:41AM: Starting reconciliation for package install 'tmc-local-stack' in namespace 'tmc-local'
7:14:41AM: Waiting for PackageInstall reconciliation for 'tmc-local-stack'
7:14:41AM: Waiting for generation 4 to be observed
7:14:41AM: Fetch started
7:14:41AM: Fetching
| apiVersion: vendir.k14s.io/v1alpha1
| directories:
| - contents:
| - imgpkgBundle:
| image: harbor.tanzu.io:8443/tmc/package-repository@sha256:6ed349cc2a7ac8b4d13700146f31fd206f33c932e6eac3f385cbe33f351eb02d
| path: .
| path: "0"
| kind: LockConfig
|
7:14:41AM: Fetch succeeded
7:14:44AM: Template succeeded
7:14:44AM: Deploy started (2s ago)
7:14:46AM: Deploying
| Target cluster 'https://100.64.0.1:443' (nodes: wc-tmc-4f7ss-szvll, 3+)
| Changes
| Namespace Name Kind Age Op Op st. Wait to Rs Ri
| Op: 0 create, 0 delete, 0 update, 0 noop, 0 exists
| Wait to: 0 reconcile, 0 delete, 0 noop
| Succeeded
7:14:53AM: Deploy succeeded
5. Delete prometheus-pod to let it load the new prometheus-alert configmap
a.kubectl delete pod -n tmc-local prometheus-server-tmc-local-monitoring-tmc-local-0