ClickHouse fails with TOO_MANY_UNEXPECTED_DATA_PARTS in Tanzu Hub
search cancel

ClickHouse fails with TOO_MANY_UNEXPECTED_DATA_PARTS in Tanzu Hub

book

Article ID: 432946

calendar_today

Updated On:

Products

VMware Tanzu Platform - Hub

Issue/Introduction

You would notice ensemble-observability-store in  CrashLoopBackOff, you would observe the following errors in the clickhouse server logs 

Code: 231. DB::Exception: Suspiciously many (N parts, 0.00 B in total) broken parts to remove while maximum allowed broken parts count is 100.
Cannot attach table `<database>`.`<table>` from metadata file ...(TOO_MANY_UNEXPECTED_DATA_PARTS)

Since the database fails to attach one or more table, even though clickhouse server starts dependant services like ensemble-observability-store would be in crashloop state.

 

Environment

In Tanzu Hub ClickHouse is managed through the following stack:

  • PackageInstall (pkgi): clickhouse-metrics in namespace tanzusm
  • ClickHouseInstallation CR (CHI): managed by clickhouse-operator (Altinity)
  • Values Secrets: clickhouse-metrics-values-ver-N and clickhouse-secret-patch
  • Filesystem: Read-only in pods — no direct config file editing possible

 

Cause

ClickHouse has a safety threshold (max_suspicious_broken_parts, default 100) that prevents table attachment when too many broken data parts are detected. This is a safeguard against silent data loss. When the number of broken parts exceeds this threshold, the server refuses to attach the affected table, which can cascade into dependent service failures.

Common causes of excessive broken parts:

  • Unclean pod shutdowns or node failures during active writes/merges
  • Disk I/O errors or underlying storage issues on PersistentVolumes
  • Out-of-disk conditions during background merge operations
  • Replication lag combined with aggressive TTL cleanup

Resolution

1) Identify the name of the CHI resource 

kubectl get chi -n tanzusm -o name
# Output: clickhouseinstallation.clickhouse.altinity.com/clickhouse-metrics

2) Verify the current setting

kubectl get chi clickhouse-metrics -n tanzusm -o jsonpath='{.spec.configuration.settings}' | jq .

3) Pause the sm and clickhouse-metrics pkgi resources 

kctrl package installed pause -i sm -n tanzusm --yes
kctrl package installed pause -i clickhouse-metrics -n tanzusm --yes

4) Patch the CHI with the correct threshold

kubectl patch chi clickhouse-metrics -n tanzusm --type json \
  -p '[{"op":"add",
       "path":"/spec/configuration/settings/merge_tree~1max_suspicious_broken_parts",
       "value":"200"}]'

5) Verify the setting was applied to the CHI 

kubectl get chi clickhouse-metrics -n tanzusm -o jsonpath='{.spec.configuration.settings}' | jq .

6) Restart ClickHouse pods 

kubectl delete pod -n tanzusm -l clickhouse.altinity.com/chi=clickhouse-metrics

7) Verify Runtime Setting

Once the pod is restarted we can validate if the settings are applied by exec into the pod.

Note: Authentication is required, we can retrieve from the below secret:

kubectl get secret clickhouse-secret -n tanzusm -o jsonpath='{.data.password}' | base64 -d

8) Verify TABLE attachment 

kubectl exec -it chi-clickhouse-metrics-default-0-0-0 \
  -c clickhouse -n tanzusm -- clickhouse-client \
  --user clickhouse --password '<PASSWORD>' \
  --query "SELECT count() FROM <database>.<table>"

9) Restart Dependant Services 

kubectl rollout restart deployment ensemble-observability-store -n tanzusm
kubectl rollout restart deployment ensemble-ui -n tanzusm

Verify the CrashLoopBackOff pods recover:

kubectl get pods -n tanzusm | grep ensemble

10) Once the pods are up and running we can resume the package reconciliation

kctrl package installed kick -i sm -n tanzusm --yes
kctrl package installed kick -i clickhouse-metrics -n tanzusm --yes