We are seeing the following when trying to start the data-insights dbloader pod after upgrading to 1.3.1:
Code: 159. DB::Exception: Received from clickhouse:9000. DB::Exception: Distributed DDL task /clickhouse/task_queue/ddl/query-0000000869 is not finished on 1 of 4 hosts (0 of them are currently executing the task, 0 are inactive). They are going to execute the query in background. Was waiting for 180.711875925 seconds, which is longer than distributed_ddl_task_timeout. (TIMEOUT_EXCEEDED)Uploading a full dump after it's completed. Sending in the init-db logs initially.
Most issues relating to clickhouse (dbloader init-container fails, query service or dbloader cannot connect to clickhouse) can be solved by restarting DBloader.
If the init container doesnt work on second run, restart clickhouse servers.
If still not resolved, then running the below k8 job modification, updated for namespace, image, and shards/replicas should work.
export USER=default
export SHARDS=1
export REPLICAS=2
export NAMESPACE=<your_namespace>
for shard in $(seq 0 $(( $SHARDS - 1 )) ); do
for server in $(seq 0 $(( $REPLICAS - 1 )) ); do
echo
echo "running commands on shard $shard server $server !"
echo
clickhouse-client --host=clickhouse-shard$shard-$server.clickhouse-headless.$NAMESPACE.svc.cluster.local --port=9000 --user $USER --password $CLICKHOUSE_SUPERUSER_PWD --multiquery --query "SYSTEM RESTORE REPLICA watchtower.metrics_gauge_v0; select count() from watchtower.metrics_gauge_v0;" || true
clickhouse-client --host=clickhouse-shard$shard-$server.clickhouse-headless.$NAMESPACE.svc.cluster.local --port=9000 --user $USER --password $CLICKHOUSE_SUPERUSER_PWD --multiquery --query "SYSTEM RESTORE REPLICA watchtower.metrics_daily_v0; select count() from watchtower.metrics_daily_v0;" || true
clickhouse-client --host=clickhouse-shard$shard-$server.clickhouse-headless.$NAMESPACE.svc.cluster.local --port=9000 --user $USER --password $CLICKHOUSE_SUPERUSER_PWD --multiquery --query "SYSTEM RESTORE REPLICA watchtower.resources_v0; select count() from watchtower.resources_v0;" || true
clickhouse-client --host=clickhouse-shard$shard-$server.clickhouse-headless.$NAMESPACE.svc.cluster.local --port=9000 --user $USER --password $CLICKHOUSE_SUPERUSER_PWD --multiquery --query "SYSTEM RESTORE REPLICA watchtower.metrics_hourly_v0; select count() from watchtower.metrics_hourly_v0;" || true
clickhouse-client --host=clickhouse-shard$shard-$server.clickhouse-headless.$NAMESPACE.svc.cluster.local --port=9000 --user $USER --password $CLICKHOUSE_SUPERUSER_PWD --multiquery --query "SYSTEM RESTORE REPLICA watchtower.mlhighway_v0; select count() from watchtower.mlhighway_v0;" || true
clickhouse-client --host=clickhouse-shard$shard-$server.clickhouse-headless.$NAMESPACE.svc.cluster.local --port=9000 --user $USER --password $CLICKHOUSE_SUPERUSER_PWD --multiquery --query "SYSTEM RESTORE REPLICA watchtower.mlalerts_v0; select count() from watchtower.mlalerts_v0;" || true
done
done