One symptom of this issue is that you can see some apps are missing
but looking into cf cli you can see the apps are existing
Another symptom is that the clickhouse pods are in Pending State
kubectl get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tanzusm chi-clickhouse-metrics-default-0-0-0 0/1 Pending 0 24s <none> <none> <none> <none>
Tanzu Hub version 10.3.3
When a clickhouse vm was restarted this triggered node label to be deleted.
To check node label:
kubectl get nodes --show-labels | grep click
192.###.#.### Ready <none> 55m v1.32.8+vmware.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,bosh.id=2c631022-7ea0-4408-8f15-0b1a29a9d808,bosh.zone=AZ1,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.###.#.### ,kubernetes.io/os=linux,pks-system/cluster.name=tanzu-hub-cluster,pks-system/cluster.uuid=tkgi-manual,platform.tanzu.vmware.com/service=clickhouse-metrics,spec.ip=192.###.#.###
notice this label is missing
platform.tanzu.vmware.com/node=clickhouse-metrics-0
A temporary solution:
Apply the label manually:
kubectl label nodes 192.###.#.### platform.tanzu.vmware.com/node=clickhouse-metrics-0
OR
you can use the hubsm-install errand so that node labels are re appled
bosh -d <hub-deployment> ran-errand hubsm-install
For a permanent solution:
We will be releasing a version containing the fix. This KB will be updated once a fix is released.