ClickHouse pods remain in Pending state after node replacement or infrastructure changes since pods are not scheduled on the correct nodes where their Persistent Volumes (PVs) are bound.
Missing or incorrect node labels for platform.tanzu.vmware.com/node on worker nodes
How to check for the issue.
Step 1: Verify ClickHouse pod status
# kubectl get pods -n tanzusm -l app=clickhouse-op -o wide
chi-clickhouse-metrics-default-0-0-0 1/1 Pending 0 1d
chi-clickhouse-metrics-default-1-0-0 1/1 Pending 0 1d
chi-clickhouse-metrics-default-2-0-0 1/1 Pending 0 1d
Step 2: Check pod events for scheduling failures
# kubectl describe pod <clickhouse-pod-name> -n tanzusm | grep -A 20 "Events:"
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 9m30s (x25 over 129m) default-scheduler 0/20 nodes are available: 1 node(s) had untolerated taint {platform.tanzu.vmware.com/service: blobstore}, 1 node(s) had untolerated taint {platform.tanzu.vmware.com/service: prometheus}, 1 node(s) had volume node affinity conflict, 11 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {platform.tanzu.vmware.com/service: kafka}, 3 node(s) had untolerated taint {platform.tanzu.vmware.com/service: postgres}. preemption: 0/20 nodes are available: 20 Preemption is not helpful for scheduling.
Look for events like:
Tanzu Hub 10.3.3
The issue occurs due to a mismatch between Kubernetes node labels and ClickHouse Persistent Volume (PV) locality assignments.
Root Cause Details
PV Node Binding: ClickHouse uses local persistent volumes that are bound to specific nodes via the local.path.provisioner/selected-node annotation. This ensures data locality - each ClickHouse replica's data resides on a specific node.
Step 1: SSH to the registry VM
SSH to Registry VM to run kubectl commands, KUBECONFIG with admin access is set by default
bosh -d <Hub Deployment> ssh registry
Verify connectivity
kubectl cluster-info
kubectl get nodes
Step2: Identify the PV-to-Node mapping
Get all ClickHouse PVs with their assigned nodes and claim names
# kubectl get pv -o custom-columns=NAME:.metadata.name,NODE:.metadata.annotations.'local\.path\.provisioner/selected-node',CLAIM:.spec.claimRef.name --no-headers | grep clickhouse
pvc-########-####-####-####-34f02d005d47 ###.###.###.73 data-volume-claim-chi-clickhouse-metrics-default-0-0-0
pvc-########-####-####-####-b105269af3f5 ###.###.###.89 data-volume-claim-chi-clickhouse-metrics-default-1-0-0
pvc-########-####-####-####-85e03280e858 ###.###.###.90 data-volume-claim-chi-clickhouse-metrics-default-2-0-0
Step 3: Understand the claim naming pattern and extract shard index
The claim naming pattern is:
data-volume-claim-chi-clickhouse-metrics-default-X-Y-Z
Where X (the first number in the X-Y-Z suffix) is the shard index and Y is replica index
We are looking at X-0-0, meaning replica 0 of all the shards.
| Claim Pattern | Shard Index | Required Label Value |
| ...-default-0-0-0 | 0 | clickhouse-metrics-0 |
| ...-default-1-0-0 | 1 | clickhouse-metrics-1 |
| ...-default-2-0-0 | 2 | clickhouse-metrics-2 |
Step 4: Check the current label on each node
Before applying labels, check what label (if any) already exists:
# kubectl get pv -o custom-columns=NAME:.metadata.name,NODE:.metadata.annotations.'local\.path\.provisioner/selected-node',CLAIM:.spec.claimRef.name --no-headers | grep clickhouse | awk '{print $2}' | sort -u | while read node; do echo "Node: $node - Label: $(kubectl get node $node -o jsonpath='{.metadata.labels.platform\.tanzu\.vmware\.com/node}')"; done
Node: ###.###.###.73 - Label: clickhouse-metrics-0
Node: ###.###.###.89 - Label: clickhouse-metrics-1
Node: ###.###.###.90 - Label: clickhouse-metrics-2
If empty, the label is not set. If it returns a value, note it for comparison
Step 5: Apply the correct labels to each node
Based on the PV-to-node mapping from Step 2, apply the labels:
For shard 0 (claim ending in -0-0-0):
kubectl label node <node-name-for-shard-0> platform.tanzu.vmware.com/node=clickhouse-metrics-0 --overwrite
For shard 1 (claim ending in -1-0-0):
kubectl label node <node-name-for-shard-1> platform.tanzu.vmware.com/node=clickhouse-metrics-1 --overwrite
For shard 2 (claim ending in -2-0-0):
kubectl label node <node-name-for-shard-2> platform.tanzu.vmware.com/node=clickhouse-metrics-2 --overwrite
Note: The --overwrite flag ensures that if an incorrect label exists, it will be replaced with the correct value.
Step 6: Verify the labels were applied correctly
# kubectl get nodes -o custom-columns=NAME:.metadata.name,CLICKHOUSE_LABEL:.metadata.labels.'platform\.tanzu\.vmware\.com/node' | grep clickhouse
###.###.###.73 clickhouse-metrics-0
###.###.###.89 clickhouse-metrics-1
###.###.###.90 clickhouse-metrics-2
Step 7: Restart pending ClickHouse pods (if needed)
If ClickHouse pods are still in Pending state, delete them to trigger rescheduling:
# List pending pods
kubectl get pods -n tanzusm | grep clickhouse
# Delete pending pods (StatefulSet will recreate them)
kubectl delete pod <pending-pod-name> -n tanzusm
Step 8: Verify ClickHouse pods are now running
kubectl get pods -n tanzusm -o wide | grep clickhouse
All pods should now be in Running state and scheduled on the correctly labeled nodes.