You would notice ensemble-observability-store in CrashLoopBackOff, you would observe the following errors in the clickhouse server logs
Code: 231. DB::Exception: Suspiciously many (N parts, 0.00 B in total) broken parts to remove while maximum allowed broken parts count is 100.
Cannot attach table `<database>`.`<table>` from metadata file ...(TOO_MANY_UNEXPECTED_DATA_PARTS)
Since the database fails to attach one or more table, even though clickhouse server starts dependant services like ensemble-observability-store would be in crashloop state.
In Tanzu Hub ClickHouse is managed through the following stack:
ClickHouse has a safety threshold (max_suspicious_broken_parts, default 100) that prevents table attachment when too many broken data parts are detected. This is a safeguard against silent data loss. When the number of broken parts exceeds this threshold, the server refuses to attach the affected table, which can cascade into dependent service failures.
Common causes of excessive broken parts:
1) Identify the name of the CHI resource
kubectl get chi -n tanzusm -o name
# Output: clickhouseinstallation.clickhouse.altinity.com/clickhouse-metrics
2) Verify the current setting
kubectl get chi clickhouse-metrics -n tanzusm -o jsonpath='{.spec.configuration.settings}' | jq .
3) Pause the sm and clickhouse-metrics pkgi resources
kctrl package installed pause -i sm -n tanzusm --yes
kctrl package installed pause -i clickhouse-metrics -n tanzusm --yes
4) Patch the CHI with the correct threshold
kubectl patch chi clickhouse-metrics -n tanzusm --type json \
-p '[{"op":"add",
"path":"/spec/configuration/settings/merge_tree~1max_suspicious_broken_parts",
"value":"200"}]'
5) Verify the setting was applied to the CHI
kubectl get chi clickhouse-metrics -n tanzusm -o jsonpath='{.spec.configuration.settings}' | jq .
6) Restart ClickHouse pods
kubectl delete pod -n tanzusm -l clickhouse.altinity.com/chi=clickhouse-metrics
7) Verify Runtime Setting
Once the pod is restarted we can validate if the settings are applied by exec into the pod.
Note: Authentication is required, we can retrieve from the below secret:
kubectl get secret clickhouse-secret -n tanzusm -o jsonpath='{.data.password}' | base64 -d
8) Verify TABLE attachment
kubectl exec -it chi-clickhouse-metrics-default-0-0-0 \
-c clickhouse -n tanzusm -- clickhouse-client \
--user clickhouse --password '<PASSWORD>' \
--query "SELECT count() FROM <database>.<table>"
9) Restart Dependant Services
kubectl rollout restart deployment ensemble-observability-store -n tanzusm
kubectl rollout restart deployment ensemble-ui -n tanzusm
Verify the CrashLoopBackOff pods recover:
kubectl get pods -n tanzusm | grep ensemble
10) Once the pods are up and running we can resume the package reconciliation
kctrl package installed kick -i sm -n tanzusm --yes
kctrl package installed kick -i clickhouse-metrics -n tanzusm --yes