Re-Creating node from scratch does not copy tables for the Postgres and Kafka engines #1455

Hubbitus · 2024-07-12T12:41:58Z

We use your Operator to manage Clickhouse cluster. Thank you.

After some hardware failure we reset PVC (and zookeeper namespace) to re-create one clickhouse node.

Most of metadata like views, materialized views and tables with most engines (MergeTree, ReplicatedMergeTree etc.) was successfully re-created on the node and replication was started.

Meantime none of Postgres and Kafka based engines tables was recreated.
Is it a bug, or we need to use some commands or hacks to sync all metadata across the cluster?

The text was updated successfully, but these errors were encountered:

alex-zaitsev · 2024-07-18T10:18:37Z

@Hubbitus , have you used latest 0.23.6 or earlier release?

Hubbitus · 2024-07-24T15:38:35Z

@alex-zaitsev, thank you for the response.

That was in older version. Now we have updated operator. What is a correct way to re-init node? Is it enough to just delete PVC of failed node and delete POD?

alex-zaitsev · 2024-07-30T10:30:29Z

@Hubbitus , if you want to re-init the existing node, delete STS, PVC, PV and start re-concile. Do you have multiple replicas?

Hubbitus · 2024-07-31T12:10:26Z

@alex-zaitsev, thank you for the reply.

I understand how to delete objects. But what you are meant under "start re-concile"?

I have two replicas chi-gid-gid-0-0-0 and chi-gid-gid-0-1-0. And now chi-gid-gid-0-0-0 is misfunction. I want to re-init it from the data in chi-gid-gid-0-1-0. And that should include sync all:

metadata (all type of objects like MergeTree tables, Postgres, kafka engines, materialized views, etc)
populate it with data from replica 1
Users and all permissions to the objects

alex-zaitsev · 2024-08-15T09:55:27Z

@Hubbitus , we have released 0.23.7 that is more aggressive re-creating the schema. So you may try to delete PVC/PV completely, and let it to re-create the objects.

Hubbitus · 2024-09-04T23:56:26Z

@alex-zaitsev, thank you very much!
Eventually I get it updated for our cluster:

kub_dev get pods --all-namespaces -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" -l app=clickhouse-operator                                                                                                     
altinity/clickhouse-operator:0.23.7 altinity/metrics-exporter:0.23.7

And doing in ArgoCD:

Deleted PVC default-volume-claim-chi-gid-gid-0-0-0
Deleted pod chi-gid-gid-0-0-0

Then PVC had been re-created.

I see pod is up and running.

But there are a lot of errors like 2024.09.04 23:50:34.382651 [ 712 ] {} <Error> Access(user directories): from: 10.42.9.104, user: data_quality: Authentication failed: Code: 192. DB::Exception: There is no user data_quality in local_directory. (UNKNOWN_USER).... So, users are not copied
Tables looks like also not synced:

SELECT hostname() as node, COUNT(*)
FROM clusterAllReplicas('{cluster}', system.tables)
WHERE database NOT IN ('INFORMATION_SCHEMA', 'information_schema', 'system')
GROUP BY node

node	count()
chi-gid-gid-0-1-0	620

And also error in log like: 2024.09.04 23:52:49.039132 [ 714 ] {bb628508-db8e-4cf9-8307-a13133a185c9} <Error> PredefinedQueryHandler: Code: 60. DB::Exception: Table system.operator_compatible_metrics does not exist. (UNKNOWN_TABLE) - so even in system database some tables missing...

So, I see only tables in information_schema for the 1-st node.

alex-zaitsev · 2024-09-20T07:14:12Z

Notes:

Users are not replicated by operator since it can not access sensitive data (like passwords). Use CHI/XML user management or replicated user directory.

<clickhouse>
  <user_directories replace="replace">
    <users_xml>
      <path>/etc/clickhouse-server/users.xml</path>
    </users_xml>
    <replicated>
      <zookeeper_path>/clickhouse/access/</zookeeper_path>
    </replicated>
    <local_directory>
       <path>/var/lib/clickhouse/access/</path>
    </local_directory>
  </user_directories>
</clickhouse>

Note, the order is important, but local_directory may be skipped if you are not using it. Keep it, if there are users defined with CREATE USER already, otherwise they disappear at all.

Tables in system database are not replicated as well, since it is supposed there are no user tables in there.

Others should work, so operator log is needed to check what went wrong.

The correct PVC recovery sequence is:

Delete PVC (or PVC and STS)
Run reconcile adding taskID to CHI, for instance

Looks like since you have deleted PVC and Pod, the recovery has been handled by Kubernetes (STS), and Operator even did not know that PVC has been recreated. So make sure you delete STS as well. Also consider using operator managed persistance:

spec:
  defaults:
    storageManagement:
      provisioner: Operator

Hubbitus · 2024-09-21T22:17:06Z

@alex-zaitsev, very thank you for the answer. First I would like to recover my tables, then I will try to deal with users.

Today, I eventfully receive rights to see operator pod in kube-system namespace.
And just after deletion of PVC and pod I see errors in clickhouse-operator pod:

I0921 22:13:23.555553       1 worker.go:275] processReconcilePod():gidplatform-dev/chi-gid-gid-0-0-0:Delete Pod. gidplatform-dev/chi-gid-gid-0-0-0
I0921 22:13:23.686901       1 worker.go:266] processReconcilePod():gidplatform-dev/chi-gid-gid-0-0-0:Add Pod. gidplatform-dev/chi-gid-gid-0-0-0
I0921 22:13:32.391425       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid:Found applicable templates num: 0
I0921 22:13:32.391446       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid:Applied templates num: 0
E0921 22:13:32.394908       1 connection.go:194] Exec():FAILED Exec(http://test_operator:***@chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local:8123/) doRequest: transport failed to send a request to ClickHouse: dial tcp 10.42.9.84:8123: connect: connection refused for
SQL: SYSTEM DROP DNS CACHE
W0921 22:13:32.394938       1 retry.go:52] exec():chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local:FAILED single try. No retries will be made for Applying sqls
I0921 22:13:32.414341       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid:Found applicable templates num: 0
I0921 22:13:32.414363       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid:Applied templates num: 0
I0921 22:13:32.415447       1 worker.go:387] gidplatform-dev/gid/b22b39fe-b7d8-40e3-a510-e169d1ffab18:updating endpoints for CHI-1 gid
I0921 22:13:32.450485       1 worker.go:389] gidplatform-dev/gid/b22b39fe-b7d8-40e3-a510-e169d1ffab18:IPs of the CHI-1 update endpoints gidplatform-dev/gid: len: 2 [10.42.9.84 10.42.5.92]
I0921 22:13:32.464127       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid:Found applicable templates num: 0
I0921 22:13:32.464172       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid:Applied templates num: 0
I0921 22:13:32.466517       1 worker.go:393] gidplatform-dev/gid/f2584b3a-a25a-4f22-8dfd-72f2a5166984:Update users IPS-1
I0921 22:13:32.481724       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/f2584b3a-a25a-4f22-8dfd-72f2a5166984:Update ConfigMap gidplatform-dev/chi-gid-common-usersd
I0921 22:13:42.168333       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid:Found applicable templates num: 0
I0921 22:13:42.168355       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid:Applied templates num: 0
I0921 22:13:42.190633       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid:Found applicable templates num: 0
I0921 22:13:42.190651       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid:Applied templates num: 0
I0921 22:13:42.191751       1 worker.go:387] gidplatform-dev/gid/ef8a0da7-09d3-4890-9a59-c760233aedb5:updating endpoints for CHI-1 gid
I0921 22:13:42.215106       1 worker.go:389] gidplatform-dev/gid/ef8a0da7-09d3-4890-9a59-c760233aedb5:IPs of the CHI-1 update endpoints gidplatform-dev/gid: len: 2 [10.42.9.84 10.42.5.92]
I0921 22:13:42.224452       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid:Found applicable templates num: 0
I0921 22:13:42.224470       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid:Applied templates num: 0
I0921 22:13:42.225507       1 worker.go:393] gidplatform-dev/gid/d9105257-3cfe-4596-b3bf-0f6cd6935843:Update users IPS-1
I0921 22:13:42.235027       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/d9105257-3cfe-4596-b3bf-0f6cd6935843:Update ConfigMap gidplatform-dev/chi-gid-common-usersd

Hubbitus · 2024-09-29T12:14:47Z

As we are speaking, I have tried to reconcile cluster by providing:

spec:
  taskID: "click-reconcile-1"

Indeed, that looks like triggering reconcile. Logs of operator pod:

kubectl -n kube-system logs --selector=app=clickhouse-operator --container=clickhouse-operator --tail=1000
I0929 11:54:59.076600       1 worker.go:574] ActionPlan start---------------------------------------------:
Diff start -------------------------
modified spec items num: 1
diff item [0]:'.TaskID' = '"click-reconcile-1"'
Diff end -------------------------

ActionPlan end---------------------------------------------
I0929 11:54:59.076655       1 worker-chi-reconciler.go:89] reconcileCHI():gidplatform-dev/gid/click-reconcile-1:ActionPlan has actions - continue reconcile
I0929 11:54:59.125555       1 worker.go:663] markReconcileStart():gidplatform-dev/gid/click-reconcile-1:reconcile started, task id: click-reconcile-1
I0929 11:54:59.681288       1 worker.go:820] FOUND host: ns:gidplatform-dev|chi:gid|clu:gid|sha:0|rep:0|host:0-0
I0929 11:54:59.681436       1 worker.go:820] FOUND host: ns:gidplatform-dev|chi:gid|clu:gid|sha:0|rep:1|host:0-1
I0929 11:54:59.681607       1 worker.go:844] RemoteServersGeneratorOptions: exclude hosts: [], attributes: status: , add: true, remove: false, modify: false, found: false, exclude: true
I0929 11:54:59.859367       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-configd
I0929 11:55:00.648852       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-usersd
I0929 11:55:01.284151       1 service.go:86] CreateServiceCluster():gidplatform-dev/gid/click-reconcile-1:gidplatform-dev/cluster-gid-gid
I0929 11:55:01.294688       1 worker-chi-reconciler.go:819] PDB updated: gidplatform-dev/gid-gid
I0929 11:55:01.294746       1 worker-chi-reconciler.go:554] not found ReconcileShardsAndHostsOptionsCtxKey, use empty opts
I0929 11:55:01.294769       1 worker-chi-reconciler.go:568] starting first shard separately
I0929 11:55:01.294967       1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
I0929 11:55:01.305993       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-0 version: 24.2.1.2248
I0929 11:55:01.306072       1 worker-chi-reconciler.go:684] reconcileHost():Reconcile Host start. Host: 0-0 ClickHouse version running: 24.2.1.2248
I0929 11:55:01.897135       1 worker.go:1565] getObjectStatusFromMetas():gidplatform-dev/chi-gid-gid-0-0:cur and new objects are equal based on object version label. Update of the object is not required. Object: gidplatform-dev/chi-gid-gid-0-0
I0929 11:55:01.897345       1 worker.go:1001] worker.go:1001:excludeHost():start:exclude host start
I0929 11:55:02.047624       1 worker.go:159] shouldForceRestartHost():Host restart is not required. Host: 0-0
I0929 11:55:02.047656       1 worker.go:1170] shouldExcludeHost():Host is the same, would not be updated, no need to exclude. Host/shard/cluster: 0/0/gid
I0929 11:55:02.047669       1 worker.go:1005] worker.go:1002:excludeHost():end:exclude host end
I0929 11:55:02.047693       1 worker.go:1020] worker.go:1020:completeQueries():start:complete queries start
I0929 11:55:02.047730       1 worker.go:1220] shouldWaitQueries():Will wait for queries to complete according to CHOp config 'reconcile.host.wait.queries' setting. Host is not yet in the cluster. Host/shard/cluster: 0/0/gid
I0929 11:55:02.047779       1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
I0929 11:55:02.087023       1 poller.go:138] Poll():gidplatform-dev/0-0:OK gidplatform-dev/0-0
I0929 11:55:02.087048       1 worker.go:1024] worker.go:1021:completeQueries():end:complete queries end
I0929 11:55:02.248789       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-deploy-confd-gid-0-0
I0929 11:55:02.884163       1 worker-chi-reconciler.go:716] reconcileHost():Reconcile PVCs and check possible data loss for host: 0-0
I0929 11:55:03.458635       1 worker-chi-reconciler.go:406] worker-chi-reconciler.go:406:reconcileHostStatefulSet():start:reconcile StatefulSet start
I0929 11:55:03.458764       1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
I0929 11:55:03.465752       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-0 version: 24.2.1.2248
I0929 11:55:03.472628       1 worker-chi-reconciler.go:412] reconcileHostStatefulSet():Reconcile host: 0-0. ClickHouse version: 24.2.1.2248
I0929 11:55:03.651853       1 worker.go:159] shouldForceRestartHost():Host restart is not required. Host: 0-0
I0929 11:55:03.651943       1 worker-chi-reconciler.go:425] reconcileHostStatefulSet():Reconcile host: 0-0. Reconcile StatefulSet
I0929 11:55:03.655273       1 worker.go:1565] getObjectStatusFromMetas():gidplatform-dev/chi-gid-gid-0-0:cur and new objects are equal based on object version label. Update of the object is not required. Object: gidplatform-dev/chi-gid-gid-0-0
I0929 11:55:04.097497       1 worker-chi-reconciler.go:445] worker-chi-reconciler.go:407:reconcileHostStatefulSet():end:reconcile StatefulSet end
I0929 11:55:04.654273       1 worker-chi-reconciler.go:900] reconcileService():gidplatform-dev/gid/click-reconcile-1:Service found: gidplatform-dev/chi-gid-gid-0-0. Will try to update
I0929 11:55:04.853666       1 worker.go:1459] updateService():gidplatform-dev/gid/click-reconcile-1:Update Service success: gidplatform-dev/chi-gid-gid-0-0
I0929 11:55:05.487521       1 worker-chi-reconciler.go:922] reconcileService():gidplatform-dev/gid/click-reconcile-1:Service reconcile successful: gidplatform-dev/chi-gid-gid-0-0
I0929 11:55:05.487592       1 worker-chi-reconciler.go:461] reconcileHostService():DONE Reconcile service of the host: 0-0
I0929 11:55:05.487682       1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
I0929 11:55:05.495665       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-0 version: 24.2.1.2248
I0929 11:55:05.495739       1 poller.go:138] Poll():gidplatform-dev/0-0:OK gidplatform-dev/0-0
I0929 11:55:05.495824       1 worker-chi-reconciler.go:753] reconcileHost():Check host for ClickHouse availability before migrating tables. Host: 0-0 ClickHouse version running: 24.2.1.2248
I0929 11:55:05.495957       1 worker.go:908] migrateTables():No need to add tables on host 0 to shard 0 in cluster gid
I0929 11:55:05.496005       1 worker.go:1057] includeHost():Include into cluster host 0 shard 0 cluster gid
I0929 11:55:05.496048       1 worker.go:1124] includeHostIntoClickHouseCluster():going to include host 0 shard 0 cluster gid
I0929 11:55:05.496070       1 worker.go:844] RemoteServersGeneratorOptions: exclude hosts: [], attributes: status: , add: true, remove: false, modify: false, found: false, exclude: true
I0929 11:55:05.648655       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-configd
I0929 11:55:06.449496       1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
I0929 11:55:06.463606       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-0 version: 24.2.1.2248
I0929 11:55:06.463648       1 poller.go:138] Poll():gidplatform-dev/0-0:OK gidplatform-dev/0-0
I0929 11:55:06.463703       1 worker-chi-reconciler.go:776] reconcileHost():Reconcile Host completed. Host: 0-0 ClickHouse version running: 24.2.1.2248
I0929 11:55:07.086061       1 worker-chi-reconciler.go:797] reconcileHost():[now: 2024-09-29 11:55:07.085979541 +0000 UTC m=+530555.182385088] ProgressHostsCompleted: 1 of 2
I0929 11:55:08.084486       1 worker-chi-reconciler.go:900] reconcileService():gidplatform-dev/gid/click-reconcile-1:Service found: gidplatform-dev/clickhouse-gid. Will try to update
I0929 11:55:08.253098       1 worker.go:1459] updateService():gidplatform-dev/gid/click-reconcile-1:Update Service success: gidplatform-dev/clickhouse-gid
I0929 11:55:08.883102       1 worker-chi-reconciler.go:922] reconcileService():gidplatform-dev/gid/click-reconcile-1:Service reconcile successful: gidplatform-dev/clickhouse-gid
I0929 11:55:08.883295       1 cluster.go:84] Run query on: chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I0929 11:55:08.889935       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-1 version: 24.2.1.2248
I0929 11:55:08.890015       1 worker-chi-reconciler.go:684] reconcileHost():Reconcile Host start. Host: 0-1 ClickHouse version running: 24.2.1.2248
I0929 11:55:09.524136       1 worker.go:1572] getObjectStatusFromMetas():gidplatform-dev/chi-gid-gid-0-1:cur and new objects ARE DIFFERENT based on object version label: Update of the object is required. Object: gidplatform-dev/chi-gid-gid-0-1
I0929 11:55:09.524219       1 worker.go:1001] worker.go:1001:excludeHost():start:exclude host start
I0929 11:55:09.647870       1 worker.go:159] shouldForceRestartHost():Host restart is not required. Host: 0-1
I0929 11:55:09.647935       1 worker.go:1177] shouldExcludeHost():Host should be excluded. Host/shard/cluster: 1/0/gid
I0929 11:55:09.647982       1 worker.go:1010] excludeHost():Exclude from cluster host 1 shard 0 cluster gid
I0929 11:55:10.090456       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:55:10.090524       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:55:10.132801       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:55:10.132824       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:55:10.134283       1 worker.go:387] gidplatform-dev/gid/click-reconcile-1:updating endpoints for CHI-1 gid
I0929 11:55:10.256392       1 worker.go:1099] excludeHostFromClickHouseCluster():going to exclude host 1 shard 0 cluster gid
I0929 11:55:10.256420       1 worker.go:844] RemoteServersGeneratorOptions: exclude hosts: [], attributes: status: , add: true, remove: false, modify: false, found: false, exclude: true
I0929 11:55:10.651725       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-configd
I0929 11:55:10.847886       1 worker.go:389] gidplatform-dev/gid/click-reconcile-1:IPs of the CHI-1 update endpoints gidplatform-dev/gid: len: 2 [10.42.9.86 10.42.5.48]
I0929 11:55:10.859857       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:55:10.859903       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:55:10.862438       1 worker.go:393] gidplatform-dev/gid/click-reconcile-1:Update users IPS-1
I0929 11:55:11.249384       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-usersd
I0929 11:55:11.887237       1 worker.go:1203] shouldWaitExcludeHost():wait to exclude host fallback to operator's settings. host 1 shard 0 cluster gid
I0929 11:55:11.896425       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:16.902829       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:21.913913       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:26.921150       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:31.928701       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:36.936718       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:41.945459       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:46.954333       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:51.962841       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:55:56.971440       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:01.978083       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:06.984911       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:11.996098       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:11.996147       1 poller.go:170] Poll():gidplatform-dev/0-1:WAIT:gidplatform-dev/0-1
I0929 11:56:17.002241       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:17.002279       1 poller.go:170] Poll():gidplatform-dev/0-1:WAIT:gidplatform-dev/0-1
I0929 11:56:22.008717       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:22.008762       1 poller.go:170] Poll():gidplatform-dev/0-1:WAIT:gidplatform-dev/0-1
I0929 11:56:27.015747       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:27.015810       1 poller.go:170] Poll():gidplatform-dev/0-1:WAIT:gidplatform-dev/0-1
I0929 11:56:32.024632       1 schemer.go:134] IsHostInCluster():The host 0-1 is inside the cluster
I0929 11:56:32.024713       1 poller.go:170] Poll():gidplatform-dev/0-1:WAIT:gidplatform-dev/0-1
I0929 11:56:37.037036       1 schemer.go:137] IsHostInCluster():The host 0-1 is outside of the cluster
I0929 11:56:37.037107       1 poller.go:138] Poll():gidplatform-dev/0-1:OK gidplatform-dev/0-1
I0929 11:56:37.037132       1 worker.go:1015] worker.go:1002:excludeHost():end:exclude host end
I0929 11:56:37.037189       1 worker.go:1020] worker.go:1020:completeQueries():start:complete queries start
I0929 11:56:37.037281       1 worker.go:1220] shouldWaitQueries():Will wait for queries to complete according to CHOp config 'reconcile.host.wait.queries' setting. Host is not yet in the cluster. Host/shard/cluster: 1/0/gid
I0929 11:56:37.037353       1 cluster.go:84] Run query on: chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I0929 11:56:37.041809       1 poller.go:138] Poll():gidplatform-dev/0-1:OK gidplatform-dev/0-1
I0929 11:56:37.041827       1 worker.go:1024] worker.go:1021:completeQueries():end:complete queries end
I0929 11:56:37.048773       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-deploy-confd-gid-0-1
I0929 11:56:37.098510       1 worker-chi-reconciler.go:716] reconcileHost():Reconcile PVCs and check possible data loss for host: 0-1
I0929 11:56:37.119348       1 worker-chi-reconciler.go:406] worker-chi-reconciler.go:406:reconcileHostStatefulSet():start:reconcile StatefulSet start
I0929 11:56:37.119427       1 cluster.go:84] Run query on: chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I0929 11:56:37.123489       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-1 version: 24.2.1.2248
I0929 11:56:37.127378       1 worker-chi-reconciler.go:412] reconcileHostStatefulSet():Reconcile host: 0-1. ClickHouse version: 24.2.1.2248
I0929 11:56:37.131620       1 worker.go:159] shouldForceRestartHost():Host restart is not required. Host: 0-1
I0929 11:56:37.131650       1 worker-chi-reconciler.go:425] reconcileHostStatefulSet():Reconcile host: 0-1. Reconcile StatefulSet
I0929 11:56:37.133351       1 worker.go:1565] getObjectStatusFromMetas():gidplatform-dev/chi-gid-gid-0-1:cur and new objects are equal based on object version label. Update of the object is not required. Object: gidplatform-dev/chi-gid-gid-0-1
I0929 11:56:37.168247       1 worker-chi-reconciler.go:445] worker-chi-reconciler.go:407:reconcileHostStatefulSet():end:reconcile StatefulSet end
I0929 11:56:37.653395       1 worker-chi-reconciler.go:900] reconcileService():gidplatform-dev/gid/click-reconcile-1:Service found: gidplatform-dev/chi-gid-gid-0-1. Will try to update
I0929 11:56:37.849923       1 worker.go:1459] updateService():gidplatform-dev/gid/click-reconcile-1:Update Service success: gidplatform-dev/chi-gid-gid-0-1
I0929 11:56:38.491295       1 worker-chi-reconciler.go:922] reconcileService():gidplatform-dev/gid/click-reconcile-1:Service reconcile successful: gidplatform-dev/chi-gid-gid-0-1
I0929 11:56:38.491349       1 worker-chi-reconciler.go:461] reconcileHostService():DONE Reconcile service of the host: 0-1
I0929 11:56:38.491418       1 cluster.go:84] Run query on: chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I0929 11:56:38.495556       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-1 version: 24.2.1.2248
I0929 11:56:38.495593       1 poller.go:138] Poll():gidplatform-dev/0-1:OK gidplatform-dev/0-1
I0929 11:56:38.495629       1 worker-chi-reconciler.go:753] reconcileHost():Check host for ClickHouse availability before migrating tables. Host: 0-1 ClickHouse version running: 24.2.1.2248
I0929 11:56:38.495686       1 worker.go:908] migrateTables():No need to add tables on host 1 to shard 0 in cluster gid
I0929 11:56:38.495706       1 worker.go:1057] includeHost():Include into cluster host 1 shard 0 cluster gid
I0929 11:56:38.495726       1 worker.go:1124] includeHostIntoClickHouseCluster():going to include host 1 shard 0 cluster gid
I0929 11:56:38.495737       1 worker.go:844] RemoteServersGeneratorOptions: exclude hosts: [], attributes: status: , add: true, remove: false, modify: false, found: false, exclude: true
I0929 11:56:38.654056       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-configd
I0929 11:56:39.689499       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:56:39.689543       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:56:39.711932       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:56:39.711952       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:56:39.713061       1 worker.go:387] gidplatform-dev/gid/click-reconcile-1:updating endpoints for CHI-1 gid
I0929 11:56:39.851639       1 cluster.go:84] Run query on: chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I0929 11:56:39.853763       1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-1 version: 24.2.1.2248
I0929 11:56:39.853841       1 poller.go:138] Poll():gidplatform-dev/0-1:OK gidplatform-dev/0-1
I0929 11:56:39.853942       1 worker-chi-reconciler.go:776] reconcileHost():Reconcile Host completed. Host: 0-1 ClickHouse version running: 24.2.1.2248
I0929 11:56:40.449305       1 worker.go:389] gidplatform-dev/gid/click-reconcile-1:IPs of the CHI-1 update endpoints gidplatform-dev/gid: len: 2 [10.42.9.86 10.42.5.48]
I0929 11:56:40.460088       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:56:40.460129       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:56:40.462470       1 worker.go:393] gidplatform-dev/gid/click-reconcile-1:Update users IPS-1
I0929 11:56:40.849312       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-usersd
I0929 11:56:41.078096       1 worker-chi-reconciler.go:797] reconcileHost():[now: 2024-09-29 11:56:41.078003076 +0000 UTC m=+530649.174408624] ProgressHostsCompleted: 2 of 2
I0929 11:56:43.083018       1 worker-chi-reconciler.go:581] Starting rest of shards on workers: 1
I0929 11:56:43.249032       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-configd
I0929 11:56:43.885956       1 worker-deleter.go:43] clean():gidplatform-dev/gid/click-reconcile-1:remove items scheduled for deletion
I0929 11:56:44.481307       1 worker-deleter.go:46] clean():gidplatform-dev/gid/click-reconcile-1:List of objects which have failed to reconcile:
I0929 11:56:44.481378       1 worker-deleter.go:47] clean():gidplatform-dev/gid/click-reconcile-1:List of successfully reconciled objects:
PVC: gidplatform-dev/default-volume-claim-chi-gid-gid-0-0-0
PVC: gidplatform-dev/default-volume-claim-chi-gid-gid-0-1-0
StatefulSet: gidplatform-dev/chi-gid-gid-0-1
StatefulSet: gidplatform-dev/chi-gid-gid-0-0
Service: gidplatform-dev/chi-gid-gid-0-0
Service: gidplatform-dev/clickhouse-gid
Service: gidplatform-dev/chi-gid-gid-0-1
ConfigMap: gidplatform-dev/chi-gid-common-configd
ConfigMap: gidplatform-dev/chi-gid-common-usersd
ConfigMap: gidplatform-dev/chi-gid-deploy-confd-gid-0-0
ConfigMap: gidplatform-dev/chi-gid-deploy-confd-gid-0-1
PDB: gidplatform-dev/gid-gid
I0929 11:56:45.252969       1 worker-deleter.go:50] clean():gidplatform-dev/gid/click-reconcile-1:Existing objects:
PVC: gidplatform-dev/default-volume-claim-chi-gid-gid-0-0-0
PVC: gidplatform-dev/default-volume-claim-chi-gid-gid-0-1-0
PDB: gidplatform-dev/gid-gid
StatefulSet: gidplatform-dev/chi-gid-gid-0-0
StatefulSet: gidplatform-dev/chi-gid-gid-0-1
ConfigMap: gidplatform-dev/chi-gid-common-configd
ConfigMap: gidplatform-dev/chi-gid-common-usersd
ConfigMap: gidplatform-dev/chi-gid-deploy-confd-gid-0-0
ConfigMap: gidplatform-dev/chi-gid-deploy-confd-gid-0-1
Service: gidplatform-dev/chi-gid-gid-0-0
Service: gidplatform-dev/chi-gid-gid-0-1
Service: gidplatform-dev/clickhouse-gid
I0929 11:56:45.253123       1 worker-deleter.go:52] clean():gidplatform-dev/gid/click-reconcile-1:Non-reconciled objects:
I0929 11:56:45.253195       1 worker-deleter.go:68] worker-deleter.go:68:dropReplicas():start:gidplatform-dev/gid/click-reconcile-1:drop replicas based on AP
I0929 11:56:45.253260       1 worker-deleter.go:80] worker-deleter.go:80:dropReplicas():end:gidplatform-dev/gid/click-reconcile-1:processed replicas: 0
I0929 11:56:45.253308       1 worker.go:640] addCHIToMonitoring():gidplatform-dev/gid/click-reconcile-1:add CHI to monitoring
I0929 11:56:45.885652       1 worker.go:595] worker.go:595:waitForIPAddresses():start:gidplatform-dev/gid/click-reconcile-1:wait for IP addresses to be assigned to all pods
I0929 11:56:45.893820       1 worker.go:600] gidplatform-dev/gid/click-reconcile-1:all IP addresses are in place
I0929 11:56:45.893858       1 worker.go:673] worker.go:673:finalizeReconcileAndMarkCompleted():start:gidplatform-dev/gid/click-reconcile-1:finalize reconcile
I0929 11:56:45.904253       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:56:45.904335       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:56:45.904391       1 controller.go:617] OK update watch (gidplatform-dev/gid): {"namespace":"gidplatform-dev","name":"gid","labels":{"argocd.argoproj.io/instance":"bi-clickhouse-dev","k8slens-edit-resource-version":"v1"},"annotations":{},"clusters":[{"name":"gid","hosts":[{"name":"0-0","hostname":"chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local","tcpPort":9000,"httpPort":8123},{"name":"0-1","hostname":"chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local","tcpPort":9000,"httpPort":8123}]}]}
I0929 11:56:45.906676       1 worker.go:677] gidplatform-dev/gid/click-reconcile-1:updating endpoints for CHI-2 gid
I0929 11:56:46.249853       1 worker.go:679] gidplatform-dev/gid/click-reconcile-1:IPs of the CHI-2 finalize reconcile gidplatform-dev/gid: len: 2 [10.42.9.86 10.42.5.48]
I0929 11:56:46.261380       1 chi.go:38] prepareListOfTemplates():gidplatform-dev/gid/click-reconcile-1:Found applicable templates num: 0
I0929 11:56:46.261442       1 chi.go:82] ApplyCHITemplates():gidplatform-dev/gid/click-reconcile-1:Applied templates num: 0
I0929 11:56:46.263792       1 worker.go:683] gidplatform-dev/gid/click-reconcile-1:Update users IPS-2
I0929 11:56:46.449574       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-1:Update ConfigMap gidplatform-dev/chi-gid-common-usersd
I0929 11:56:47.495545       1 worker.go:707] finalizeReconcileAndMarkCompleted():gidplatform-dev/gid/click-reconcile-1:reconcile completed successfully, task id: click-reconcile-1
I0929 11:56:48.077981       1 worker-chi-reconciler.go:134] worker-chi-reconciler.go:60:reconcileCHI():end:gidplatform-dev/gid/click-reconcile-1
I0929 11:56:48.078036       1 worker.go:469] worker.go:432:updateCHI():end:gidplatform-dev/gid/click-reconcile-1

Not sure what going wrong, but on host chi-gid-gid-0-0-0 even no databases copied. And still present only single default.

Hubbitus · 2024-10-13T20:41:51Z

@alex-zaitsev, could you please look on it?

Slach · 2024-10-14T05:37:40Z

I0929 11:55:05.495957 [worker.go:908] migrateTables():No need to add tables on host 0 to shard 0 in cluster gid

I0929 11:56:38.495686 [worker.go:908] migrateTables():No need to add tables on host 1 to shard 0 in cluster gid

@Hubbitus is your cluster have 2 shards with only 1 replica inside shard?

Could you share:
kubectl get chi -n gidplatform-de gid -o yaml
without sensitive credentials?

Hubbitus · 2024-10-15T07:35:36Z

@Slach, thanks to response.
We do not use sharding yet.

Output of kubectl get chi -n gidplatform-dev gid -o yaml:
chi.yaml.gz

Slach · 2024-10-17T05:49:23Z

@Hubbitus
Could you share result of following clickhouse-client query
SELECT database, table, engine_full, count() c FROM cluster('all-sharded',system.tables) WHERE database NOT IN ('system','INFORMATION_SCHEMA','information_schema') GROUP BY ALL HAVING c<2

Hubbitus · 2024-10-19T12:19:48Z

Sure (limit to 10, total 269):

database	table	engine_full	c
datamart	appmarket__public__widget	PostgreSQL(appmarket_db, `table` = 'widget', schema = 'public')	1
datamart	bonus__public__promotion	PostgreSQL(bonus_db, `table` = 'promotion', schema = 'public')	1
sandbox	gid_mt_sessions	ReplicatedMergeTree('/clickhouse/tables/ad0a75c4-1aa7-4386-a542-c16c19f2b2c6/{shard}', '{replica}') ORDER BY tsEvent SETTINGS index_granularity = 8192	1
_source	scs__public__story__foreign	PostgreSQL(scs_db, `table` = 'story', schema = 'public')	1
datamart	loyalty__public__level	PostgreSQL(loyalty_db, `table` = 'level', schema = 'public')	1
_source	feed__public__reaction__foreign	PostgreSQL(feed_db, `table` = 'reaction', schema = 'public')	1
_loopback	nomail_account_register	ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') PARTITION BY tuple() ORDER BY time SETTINGS index_granularity = 8192	1
_source	questionnaires__public__anketa_access_group__foreign	PostgreSQL(questionnaire_db, `table` = 'anketa_access_group', schema = 'public')	1
sandbox	gid_mt_activities	ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') ORDER BY dtEvent SETTINGS index_granularity = 8192	1
_source	lms__public__lms_user_courses_progress__foreign	PostgreSQL(lms_db, `table` = 'lms_user_courses_progress', schema = 'public')	1

Slach · 2024-10-19T12:37:08Z

You shared logs for 29 sep 2024 since 11:55 UTC
is your node lost PVC data before this date, or after this date?

Hubbitus · 2024-10-21T07:50:46Z

Hello.
Last shared logs was from 15 October. And that after one more attempt to recover by deleting PVC and STS.

Slach · 2024-10-21T11:46:18Z

@Hubbitus
#1455 (comment)
there is logs only for 29 sep 2024

i don't see logs from 15 Oct 2024

I need to ensure you tried to reconcile, after drop PVC and STS
did you change in CHI spec.taskID manually to trigger reconcile after delete PVC and STS?

Hubbitus · 2024-10-23T00:43:34Z

did you change in CHI spec.taskID manually to trigger reconcile after delete PVC and STS?

Yes. By suggestion of @alex-zaitsev I had introduced there TaskID parameter and on each clean attempt increase here number.

Slach · 2024-10-24T09:09:06Z

share clickhouse-operator logs for 15 Oct related to your changes

Hubbitus · 2024-11-02T15:02:52Z

Hello.

I do not have so old logs.

But I've switching on branch were set taskID: "click-reconcile-3". It looks like reconcile started automatically.
Relevant part of output (slightly obfuscated)

kubectl -n kube-system logs --selector=app=clickhouse-operator --container=clickhouse-operator

operator.2024-11-02T17:47:14+03:00.obfuscated.log

Output of

SELECT database, table, engine_full, count() c, hostname()
FROM
	cluster('{cluster}',system.tables)
WHERE
	database NOT IN ('system','INFORMATION_SCHEMA','information_schema')
GROUP BY ALL
HAVING c<2

Contains 515 rows. Heading of it:

database	table	engine_full	c	hostname()
datamart	v_subs__public__channel_requests		1	chi-gid-gid-0-1-0
cdc	api__public__reaction	ReplicatedReplacingMergeTree('/clickhouse/{cluster}/cdc/tables/api__public__reaction/{shard}', '{replica}') PRIMARY KEY id ORDER BY id SETTINGS index_granularity = 8192	1	chi-gid-gid-0-1-0
_source	bonus_to_gid__user_mappings	Kafka(kafka_integration, kafka_topic_list = 'dev__bonus_to_gid__user_mappings', kafka_group_name = 'dev__bonus_to_gid__user_mappings') SETTINGS format_avro_schema_registry_url = 'http://gid-integration-partner-kafka.gid.team:8081'	1	chi-gid-gid-0-1-0
datamart	v_calendar__public__event_type		1	chi-gid-gid-0-1-0
_raw	api__public__questionnaire_result__dbt_materialized	ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') ORDER BY id SETTINGS replicated_deduplication_window = 0, index_granularity = 8192	1	chi-gid-gid-0-1-0
datamart	v_jiradatabase__public__ao_54307e_slaauditlog		1	chi-gid-gid-0-1-0
datamart	v_appmarket__public__widget_notification		1	chi-gid-gid-0-1-0
_raw	feed__public__feed_comment__dbt_materialized	ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') ORDER BY id SETTINGS replicated_deduplication_window = 0, index_granularity = 8192	1	chi-gid-gid-0-1-0
_raw	lms__public__lms_courses_chapters__dbt_materialized	ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') ORDER BY id SETTINGS replicated_deduplication_window = 0, index_granularity = 8192	1	chi-gid-gid-0-1-0
datamart	tmp_gazprombonus_user_bonus_to_gid_mapping_inner	ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') ORDER BY id SETTINGS index_granularity = 1024	1	chi-gid-gid-0-1-0
datamart	api__poll_vote		1	chi-gid-gid-0-1-0
_source	loyalty__public__achievement__foreign	PostgreSQL(loyalty_db, `table` = 'achievement', schema = 'public')	1	chi-gid-gid-0-1-0
datamart	v_calendar__public__like		1	chi-gid-gid-0-1-0
_source	calendar__public__event_type__foreign	PostgreSQL(calendar_db, `table` = 'event_type', schema = 'public')	1	chi-gid-gid-0-1-0

Slach · 2024-11-02T15:22:40Z

according to logs you just triggered reconcile for -0-0-0 when sts is not deleted

try

kubectl delete sts -n gidplatform-dev chi-gid-gid-0-0
kubectl delete pvc -n gidplatform-dev -l clickhouse.altinity.com/cluster=gid,clickhouse.altinity.com/shard=0,clickhouse.altinity.com/replica=0

kubectl edit chi -n gidplatform-dev gid

edit spec.taskID to manual-4
watch reconciling process again, when sts and pvc not found during reconcile then operator shall propagate schema

Hubbitus · 2024-11-02T16:47:12Z

Ok, thank you.
Doing again:

Delete STS and PVC:

$ kubectl delete sts -n gidplatform-dev chi-gid-gid-0-0
statefulset.apps "chi-gid-gid-0-0" deleted
$ kubectl delete pvc -n gidplatform-dev -l clickhouse.altinity.com/cluster=gid,clickhouse.altinity.com/shard=0,clickhouse.altinity.com/replica=0
persistentvolumeclaim "default-volume-claim-chi-gid-gid-0-0-0" deleted

Pushed commit with taskID: "click-reconcile-4". Run sync in ArgoCD with prune.

I think relevants logs are:


I1102 16:33:32.889891       1 worker-chi-reconciler.go:89] reconcileCHI():gidplatform-dev/gid/click-reconcile-4:ActionPlan has actions - continue reconcile
I1102 16:33:32.934904       1 worker.go:663] markReconcileStart():gidplatform-dev/gid/click-reconcile-4:reconcile started, task id: click-reconcile-4
I1102 16:33:33.446722       1 worker.go:820] FOUND host: ns:gidplatform-dev|chi:gid|clu:gid|sha:0|rep:0|host:0-0
I1102 16:33:33.446764       1 worker.go:820] FOUND host: ns:gidplatform-dev|chi:gid|clu:gid|sha:0|rep:1|host:0-1
I1102 16:33:33.446914       1 worker.go:844] RemoteServersGeneratorOptions: exclude hosts: [], attributes: status: , add: true, remove: false, modify: false, found: false, exclude: true
I1102 16:33:33.642314       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-4:Update ConfigMap gidplatform-dev/chi-gid-common-configd
I1102 16:33:34.443758       1 worker.go:1315] updateConfigMap():gidplatform-dev/gid/click-reconcile-4:Update ConfigMap gidplatform-dev/chi-gid-common-usersd
I1102 16:33:35.086985       1 service.go:86] CreateServiceCluster():gidplatform-dev/gid/click-reconcile-4:gidplatform-dev/cluster-gid-gid
I1102 16:33:35.104209       1 worker-chi-reconciler.go:819] PDB updated: gidplatform-dev/gid-gid
I1102 16:33:35.104304       1 worker-chi-reconciler.go:554] not found ReconcileShardsAndHostsOptionsCtxKey, use empty opts
I1102 16:33:35.104344       1 worker-chi-reconciler.go:568] starting first shard separately
I1102 16:33:35.104638       1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
E1102 16:33:35.112714       1 connection.go:145] QueryContext():FAILED Query(http://test_operator:***@chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local:8123/) doRequest: transport failed to send a request to ClickHouse: dial tcp: lookup chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local on 10.43.0.10:53: no such host for SQL: SELECT version()
W1102 16:33:35.112777       1 cluster.go:91] QueryAny():FAILED to run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local] skip to next. err: doRequest: transport failed to send a request to ClickHouse: dial tcp: lookup chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local on 10.43.0.10:53: no such host
E1102 16:33:35.112846       1 cluster.go:95] QueryAny():FAILED to run query on all hosts [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
W1102 16:33:35.112926       1 worker-chi-reconciler.go:345] getHostClickHouseVersion():Failed to get ClickHouse version on host: 0-0
W1102 16:33:35.112980       1 worker-chi-reconciler.go:690] reconcileHost():Reconcile Host start. Host: 0-0 Failed to get ClickHouse version: failed to query
W1102 16:33:35.692945       1 worker.go:1537] gidplatform-dev/chi-gid-gid-0-0:No cur StatefulSet available but host has an ancestor. Found deleted StatefulSet. for gidplatform-dev/chi-gid-gid-0-0

So operator can't resolve hostname of node:

doRequest: transport failed to send a request to ClickHouse: dial tcp: lookup chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local on 10.43.0.10:53: no such host for SQL: SELECT version()

Indeed hostname chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local looks strange, clichouse known it in another way:

SELECT cluster, host_name
FROM system.clusters
WHERE cluster = 'gid'

cluster	host_name
gid	chi-gid-gid-0-0
gid	chi-gid-gid-0-1

Slach · 2024-11-02T17:04:57Z

You did not share full logs, just found first error message, error message is expected, because you deleted sts and kubernetes service name will not resolve

Hostname, contains SERVICE name, not pod

is sts chi-gid-gid-0-0-0 and pvc created?

could you share full operator logs?

Hubbitus · 2024-11-02T21:16:41Z

Sure. I found much more errors in log:
operator.2024-11-02T19:40:16+03:00.obfuscated.log

Slach · 2024-11-03T05:00:04Z

@sunsingerus. according to shared logs

first reconcile was applied at 2024-11-02 14:44:41
and sts + pvs was exists

I1102 14:44:44.267254 1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local]
I1102 14:44:44.276002 1 worker-chi-reconciler.go:349] getHostClickHouseVersion():Get ClickHouse version on host: 0-0 version: 24.2.1.2248
I1102 14:44:44.276073 1 worker-chi-reconciler.go:684] reconcileHost():Reconcile Host start. Host: 0-0 ClickHouse version running: 24.2.1.2248

second try after sts + pvc deletion

I1102 16:30:39.627295 1 worker.go:275] processReconcilePod():gidplatform-dev/chi-gid-gid-0-0-0:Delete Pod. gidplatform-dev/chi-gid-gid-0-0-0
I1102 16:33:32.819212 1 controller.go:572] ENQUEUE new ReconcileCHI cmd=update for gidplatform-dev/gid

I1102 16:33:32.934904 1 worker.go:663] markReconcileStart():gidplatform-dev/gid/click-reconcile-4:reconcile started, task id: click-reconcile-4

STS and PVC was deleted

W1102 16:33:35.692945 1 worker.go:1537] gidplatform-dev/chi-gid-gid-0-0:No cur StatefulSet available but host has an ancestor. Found deleted StatefulSet. for gidplatform-dev/chi-gid-gid-0-0
I1102 16:33:35.839840 1 worker.go:1177] shouldExcludeHost():Host should be excluded. Host/shard/cluster: 0/0/gid
I1102 16:33:35.839914 1 worker.go:1010] excludeHost():Exclude from cluster host 0 shard 0 cluster gid
I1102 16:33:37.880392 1 worker-chi-reconciler.go:716] reconcileHost():Reconcile PVCs and check possible data loss for host: 0-0

PVC recreated

I1102 16:33:38.042697 1 worker-chi-reconciler.go:1251] PVC (gidplatform-dev/0-0/default-volume-claim/default-volume-claim-chi-gid-gid-0-0-0) not found and model will not be provided by the operator
W1102 16:33:38.042849 1 worker-chi-reconciler.go:1162] PVC is either newly added to the host or was lost earlier (gidplatform-dev/0-0/default-volume-claim/pvc-name-unknown-pvc-not-exist)

Migration force to be applied
Start creating Statefulset

I1102 16:33:38.043010 1 worker-chi-reconciler.go:730] reconcileHost():Data loss detected for host: 0-0. Will do force migrate
I1102 16:33:38.043073 1 worker-chi-reconciler.go:406] worker-chi-reconciler.go:406:reconcileHostStatefulSet():start:reconcile StatefulSet start

I1102 16:33:38.440090 1 worker.go:1596] createStatefulSet():Create StatefulSet gidplatform-dev/chi-gid-gid-0-0 - started
I1102 16:33:39.086823 1 creator.go:35] createStatefulSet()
I1102 16:33:39.086858 1 creator.go:44] Create StatefulSet gidplatform-dev/chi-gid-gid-0-0

I1102 16:34:09.634311 1 worker.go:1615] createStatefulSet():Create StatefulSet gidplatform-dev/chi-gid-gid-0-0 - completed

Prepare for table migration, try

I1102 16:34:11.228817 1 worker-chi-reconciler.go:753] reconcileHost():Check host for ClickHouse availability before migrating tables. Host: 0-0 ClickHouse version running: 24.2.1.2248

Trying drop data from ZK

@sunsingerus
0-0-0 is empty and doesn't contains any table definitions,
so I think, SYSTEM DROP REPLICA 'chi-gid-gid-0-0-0' will do nothing, and this is root cause

I1102 16:34:11.228960 1 schemer.go:56] HostDropReplica():Drop replica: chi-gid-gid-0-0 at 0-0
I1102 16:34:11.236511 1 worker-deleter.go:414] dropReplica():Drop replica host: 0-0 in cluster: gid

Get SQL object definitions

I1102 16:34:12.415941 1 replicated.go:35] shouldCreateReplicatedObjects():SchemaPolicy.Shard says we need replicated objects. Should create replicated objects for the shard: [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I1102 16:34:12.416343 1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I1102 16:34:12.437761 1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I1102 16:34:12.717044 1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I1102 16:34:12.756658 1 distributed.go:39] shouldCreateDistributedObjects():Should create distributed objects in the cluster: [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I1102 16:34:12.756850 1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I1102 16:34:12.786098 1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]
I1102 16:34:13.018954 1 cluster.go:84] Run query on: chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local of [chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local chi-gid-gid-0-1.gidplatform-dev.svc.cluster.local]

Trying to restore and failure cause ZK data still present

I1102 16:34:13.035413 1 schemer.go:98] HostCreateTables():Creating replicated objects at 0-0: [_loopback _raw service _source temp ....]
E1102 16:34:13.089822 1 connection.go:194] Exec():FAILED Exec(http://test_operator:***@chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local:8123/) Code: 253, Message: Replica /clickhouse/tables/edf41bd4-46aa-4341-bed7-2e19b838e9e1/0/replicas/chi-gid-gid-0-0 already exists for SQL: CREATE TABLE IF NOT EXISTS _loopback.nomail_account_register UUID 'edf41bd4-46aa-4341-bed7-2e19b838e9e1' ....
I1102 16:34:13.089918 1 cluster.go:160] func1():chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local:Replica is already in ZooKeeper. Trying ATTACH TABLE instead

We need to choose 0-1-0 for execution SYSTEM DROP REPLICA...

Slach · 2024-11-03T05:22:29Z

@Hubbitus after reconcile
most of tables shall restore (via ATTACH)
but some of tables, not restored with strange difference

`slo_value` Decimal(15, 5)	DEFAULT	0

in zookeeper
and

`slo_value` Decimal(15, 5)

in local SQL

E1102 16:36:08.836812       1 connection.go:194] Exec():FAILED Exec(http://test_operator:***@chi-gid-gid-0-0.gidplatform-dev.svc.cluster.local:8123/) Code: 122, Message: Table columns structure in ZooKeeper is different from local table structure. Local columns:
columns format version: 1
14 columns:
...
Zookeeper columns:
columns format version: 1
14 columns:
...
for SQL: CREATE TABLE IF NOT EXISTS _raw.victoriametrics__slo__metrics__airflow_hour_agg_old UUID '46a1c218-9274-446d-9300-e644bbd4cc0e' ....

Slach · 2024-11-03T05:24:46Z

@Hubbitus could you share

kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SHOW CREATE TABLE _raw.victoriametrics__slo__metrics__airflow_hour_agg_old FORMA Vertical"

and

kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- clickhouse-client -q "SHOW CREATE TABLE _raw.victoriametrics__slo__metrics__airflow_hour_agg_old FORMA Vertical"

Hubbitus · 2024-11-03T15:30:51Z

@Slach, sure (comments of columns and table stripped):

$ kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SHOW CREATE TABLE _raw.victoriametrics__slo__metrics__airflow_hour_agg_old FORMAT Vertical"

Row 1:
──────
statement: CREATE TABLE _raw.victoriametrics__slo__metrics__airflow_hour_agg_old
(
    `slo_metric` LowCardinality(String),
    `slo_service` LowCardinality(String),
    `slo_namespace` LowCardinality(String),
    `slo_status` LowCardinality(String),
    `slo_method` LowCardinality(String),
    `slo_uri` LowCardinality(String),
    `slo_le` LowCardinality(String),
    `slo_event_ts` DateTime64(6, 'UTC'),
    `slo_orig_value` UInt64,
    `slo_value` UInt32,
    `slo_rec_num` UInt32,
    `slo_tags` Map(LowCardinality(String), LowCardinality(String)),
    `_row_hash_` UInt64 MATERIALIZED cityHash64(slo_metric, slo_service, slo_namespace, slo_status, slo_method, slo_uri, slo_event_ts, slo_value, slo_rec_num, slo_tags),
    `__insert_ts` DateTime64(6, 'UTC') DEFAULT now64(6, 'UTC')
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
ORDER BY (slo_metric, slo_service, slo_namespace, slo_event_ts)
SETTINGS index_granularity = 8192

$ kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- clickhouse-client -q "SHOW CREATE TABLE _raw.victoriametrics__slo__metrics__airflow_hour_agg_old FORMAT Vertical"
Received exception from server (version 24.2.1):
Code: 390. DB::Exception: Received from localhost:9000. DB::Exception: Table `victoriametrics__slo__metrics__airflow_hour_agg_old` doesn't exist. (CANNOT_GET_CREATE_TABLE_QUERY)
(query: SHOW CREATE TABLE _raw.victoriametrics__slo__metrics__airflow_hour_agg_old FORMAT Vertical)
command terminated with exit code 134

Error looks reasonable - we got error on table creation after delete STS and PVC, is not?
Maybe some info leaved in zookeeper?

Slach · 2024-11-03T15:47:30Z

add SETTINGS show_table_uuid_in_table_create_query_if_not_nil=1

kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SHOW CREATE TABLE _raw.victoriametrics__slo__metrics__airflow_hour_agg_old SETTTINGS show_table_uuid_in_table_create_query_if_not_nil=1 FORMAT Vertical"

Hubbitus · 2024-11-03T17:02:28Z

$ kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SHOW CREATE TABLE _raw.victoriametrics__slo__metrics__airflow_hour_agg_old FORMAT Vertical SETTINGS show_table_uuid_in_table_create_query_if_not_nil=1"
Row 1:
──────
statement: CREATE TABLE _raw.victoriametrics__slo__metrics__airflow_hour_agg_old UUID '46a1c218-9274-446d-9300-e644bbd4cc0e'
(
    `slo_metric` LowCardinality(String),
    `slo_service` LowCardinality(String),
    `slo_namespace` LowCardinality(String),
    `slo_status` LowCardinality(String),
    `slo_method` LowCardinality(String),
    `slo_uri` LowCardinality(String),
    `slo_le` LowCardinality(String),
    `slo_event_ts` DateTime64(6, 'UTC'),
    `slo_orig_value` UInt64,
    `slo_value` UInt32,
    `slo_rec_num` UInt32,
    `slo_tags` Map(LowCardinality(String), LowCardinality(String)),
    `_row_hash_` UInt64 MATERIALIZED cityHash64(slo_metric, slo_service, slo_namespace, slo_status, slo_method, slo_uri, slo_event_ts, slo_value, slo_rec_num, slo_tags),
    `__insert_ts` DateTime64(6, 'UTC') DEFAULT now64(6, 'UTC')
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
ORDER BY (slo_metric, slo_service, slo_namespace, slo_event_ts)
SETTINGS index_granularity = 8192

Slach · 2024-11-03T18:13:12Z

try to restore table

CREATE TABLE IF NOT EXISTS _raw.victoriametrics__slo__metrics__airflow_hour_agg_old UUID '46a1c218-9274-446d-9300-e644bbd4cc0e' ON CLUSTER '{cluster}'
(
    `slo_metric` LowCardinality(String),
    `slo_service` LowCardinality(String),
    `slo_namespace` LowCardinality(String),
    `slo_status` LowCardinality(String),
    `slo_method` LowCardinality(String),
    `slo_uri` LowCardinality(String),
    `slo_le` LowCardinality(String),
    `slo_event_ts` DateTime64(6, 'UTC'),
    `slo_orig_value` UInt64,
    `slo_value` UInt32,
    `slo_rec_num` UInt32,
    `slo_tags` Map(LowCardinality(String), LowCardinality(String)),
    `_row_hash_` UInt64 MATERIALIZED cityHash64(slo_metric, slo_service, slo_namespace, slo_status, slo_method, slo_uri, slo_event_ts, slo_value, slo_rec_num, slo_tags),
    `__insert_ts` DateTime64(6, 'UTC') DEFAULT now64(6, 'UTC')
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
ORDER BY (slo_metric, slo_service, slo_namespace, slo_event_ts)
SETTINGS index_granularity = 8192

Hubbitus · 2024-11-03T18:58:03Z

@Slach, why it is not restored automatically? Really, this table may be deleted. IS it the solution?

Slach · 2024-11-04T03:10:04Z

It tried to restore, and most of the tables were restored (am I right?)
but looks like you applied some mutations or something else and few tables contains different SQL structure which we get in operator from system.tables with current zookeeper /clickhouse/tables/{uuid}/{shard}/{replica}/columns key content

Slach · 2024-11-04T03:11:00Z

try to upgrade clickhouse-keeper to latest and clickhouse-server to 24.8 latest LTS

Hubbitus · 2024-11-04T17:59:25Z

It tried to restore, and most of the tables were restored (am I right?)

No.

Query

SELECT database, table, engine_full, count() c, hostname()
FROM
	cluster('{cluster}',system.tables)
WHERE
	database NOT IN ('system','INFORMATION_SCHEMA','information_schema')
GROUP BY ALL
HAVING c<2

Still returns 729 tables present only on host chi-gid-gid-0-1-0.

try to upgrade clickhouse-keeper to latest and clickhouse-server to 24.8 latest LTS

It is not so fast. We will try to do it in the next week.
But we use zookeeper, and not clickhouse keeper. Is it important to switch (that may be the problem)?

Slach · 2024-11-04T18:23:58Z

ok. let's try again

# safe shutdown 
kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- clickhouse-client -q "SYSTEM SHUTDOWN"

# different node
kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SYSTEM DROP REPLICA 'chi-gid-gid-0-0'"

# delete sts to propogate schema during reconcile
kubectl delete sts -n gidplatform-dev chi-gid-gid-0-0

# change spec.taskID
kubectl edit chi -n gidplatform-dev gid

# wait when 
watch -n 1 kubectl describe chi chi -n gidplatform-dev gid

# check tables
# share clickhouse-operator logs

Hubbitus · 2024-11-05T15:09:27Z

Ok, lets try!

I have updated Clickhouse to version 24.8.6.70 (LiveView was dropped because of incompatibility).

Still used zokeeper (no step to clickhouse-keeper)

New attempt:

kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- clickhouse-client -q "SYSTEM SHUTDOWN"
kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SYSTEM DROP REPLICA 'chi-gid-gid-0-0'"
In pod logs bunch of lines like:

2024.11.05 14:40:38.743969 [ 854 ] {edd055cb-da43-4722-a28c-d394c9a4d119} <Information> InterpreterSystemQuery: Removing replica /clickhouse/tables/60542e19-3cef-4f3b-9e24-0cd98431c112/0/replicas/chi-gid-gid-0-0, marking it as lost                                      
2024.11.05 14:40:38.791925 [ 854 ] {edd055cb-da43-4722-a28c-d394c9a4d119} <Information> InterpreterSystemQuery: Removing replica /clickhouse/tables/336feb8b-b059-42e1-894d-f84691b935ce/0/replicas/chi-gid-gid-0-0, marking it as lost                                      
2024.11.05 14:40:38.836350 [ 854 ] {edd055cb-da43-4722-a28c-d394c9a4d119} <Information> InterpreterSystemQuery: Removing replica /clickhouse/tables/7a551a1f-9721-4907-affc-b5564e1ccdde/0/replicas/chi-gid-gid-0-0, marking it as lost                                      
2024.11.05 14:40:38.880698 [ 854 ] {edd055cb-da43-4722-a28c-d394c9a4d119} <Information> InterpreterSystemQuery: Removing replica /clickhouse/tables/e7a92d37-a5dc-40ee-a79d-946373b57f10/0/replicas/chi-gid-gid-0-0, marking it as lost                                      
2024.11.05 14:40:38.905697 [ 612 ] {} <Debug> StorageKafka (tracking_stage): Pushing 0.00 rows to _source.tracking_stage (586af751-597e-4bf8-a8f5-5a226f7d8a4b) took 5015 ms.                                                                                                
2024.11.05 14:40:38.906393 [ 612 ] {} <Debug> MemoryTracker: Peak memory usage: 280.37 KiB.

$ kubectl delete sts -n gidplatform-dev chi-gid-gid-0-0
statefulset.apps "chi-gid-gid-0-0" deleted

Wasn't PVC should be deleted also?

Commit update with taskID: "click-reconcile-5". In ArgoCD call sync with prune option.

# wait when
watch -n 1 kubectl describe chi chi -n gidplatform-dev gid

Sorry, what I should wait in such output?

For command kubectl describe chi chi -n gidplatform-dev gid last lines are:

  Error    CreateFailed            32s    clickhouse-operator  ERROR add tables added successfully on shard/host:0/0 cluster:gid err:Code: 415, Message: Table /clickhouse/tables/62324966-8fef-4f9e-908d-a990b880d20c/0 was suddenly removed
  Info     UpdateCompleted         31s    clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-common-configd
  Info     ProgressHostsCompleted  31s    clickhouse-operator  [now: 2024-11-05 14:49:20.814770037 +0000 UTC m=+699169.611621638] ProgressHostsCompleted: 1 of 2
  Info     ReconcileCompleted      31s    clickhouse-operator  Reconcile Host completed. Host: 0-0 ClickHouse version running: 24.8.6.70
  Info     UpdateCompleted         30s    clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-common-usersd
  Info     UpdateCompleted         28s    clickhouse-operator  Update Service success: gidplatform-dev/clickhouse-gid
  Info     ReconcileStarted        27s    clickhouse-operator  Reconcile Host start. Host: 0-1 ClickHouse version running: 24.8.6.70
  Info     UpdateCompleted         26s    clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-common-configd
  Info     UpdateCompleted         25s    clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-common-usersd
Error from server (NotFound): clickhouseinstallations.clickhouse.altinity.com "chi" not found

Waiting some time. I suppose I need line Info ReconcileCompleted 17s clickhouse-operator reconcile completed successfully, task id: click-reconcile-5. Is not?

  Info     UpdateCompleted         9m59s  clickhouse-operator  Update Service success: gidplatform-dev/chi-gid-gid-0-0
  Info     DeleteCompleted         9m58s  clickhouse-operator  Drop replica host: 0-0 in cluster: gid
  Info     CreateStarted           9m58s  clickhouse-operator  Adding tables on shard/host:0/0 cluster:gid
  Error    CreateFailed            5m30s  clickhouse-operator  ERROR add tables added successfully on shard/host:0/0 cluster:gid err:Code: 415, Message: Table /clickhouse/tables/62324966-8fef-4f9e-908d-a990b880d20c/0 was suddenly removed
  Info     ReconcileCompleted      5m29s  clickhouse-operator  Reconcile Host completed. Host: 0-0 ClickHouse version running: 24.8.6.70
  Info     ProgressHostsCompleted  5m29s  clickhouse-operator  [now: 2024-11-05 14:49:20.814770037 +0000 UTC m=+699169.611621638] ProgressHostsCompleted: 1 of 2
  Info     UpdateCompleted         5m29s  clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-common-configd
  Info     UpdateCompleted         5m28s  clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-common-usersd
  Info     UpdateCompleted         5m26s  clickhouse-operator  Update Service success: gidplatform-dev/clickhouse-gid
  Info     ReconcileStarted        5m25s  clickhouse-operator  Reconcile Host start. Host: 0-1 ClickHouse version running: 24.8.6.70
  Info     UpdateCompleted         5m24s  clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-common-configd
  Info     UpdateCompleted         5m23s  clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-common-usersd
  Info     UpdateCompleted         92s    clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-deploy-confd-gid-0-1
  Info     UpdateCompleted         91s    clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-common-configd
  Info     UpdateCompleted         91s    clickhouse-operator  Update Service success: gidplatform-dev/chi-gid-gid-0-1
  Info     ReconcileCompleted      89s    clickhouse-operator  Reconcile Host completed. Host: 0-1 ClickHouse version running: 24.8.6.70
  Info     UpdateCompleted         88s    clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-common-usersd
  Info     ProgressHostsCompleted  88s    clickhouse-operator  [now: 2024-11-05 14:53:21.400360761 +0000 UTC m=+699410.197212365] ProgressHostsCompleted: 2 of 2
  Info     UpdateCompleted         86s    clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-common-configd
  Info     ReconcileInProgress     85s    clickhouse-operator  remove items scheduled for deletion
  Info     ReconcileInProgress     84s    clickhouse-operator  add CHI to monitoring
  Info     UpdateCompleted         83s    clickhouse-operator  Update ConfigMap gidplatform-dev/chi-gid-common-usersd
  Info     ReconcileCompleted      82s    clickhouse-operator  reconcile completed successfully, task id: click-reconcile-5
Error from server (NotFound): clickhouseinstallations.clickhouse.altinity.com "chi" not found

Looks like same result: 727 tables on single host chi-gid-gid-0-1-0.
Full log:
operator.2024-11-05T17:56:30+03:00.obfuscated.log

I see again a bunch of authorization errors in logs, but do not want to make any assumptions.

Slach · 2024-11-06T04:26:02Z

kubectl delete sts -n gidplatform-dev chi-gid-gid-0-0
statefulset.apps "chi-gid-gid-0-0" deleted
Wasn't PVC should be deleted also?

I think we need to delete PVC+PV for dead 0-0-0, to complete erase data for -0-0-0

I found in logs
storageclassname: openebs-hostpath-dataplatform
it means data in clickhouse will lost if your pod will re-scheduled in different worker node
of if worker node will reinstall from scratch
did you have operations similar to described above?

Let another try

# safe shutdown 
kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- clickhouse-client -q "SYSTEM SHUTDOWN"

# different node
kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SYSTEM DROP REPLICA 'chi-gid-gid-0-0'"

# delete sts+PV+PVC to propogate schema during reconcile
kubectl delete sts -n gidplatform-dev chi-gid-gid-0-0
kubectl delete pvc -n gidplatform-dev -l clickhouse.altinity.com/chi=gid,clickhouse.altinity.com/replica=0,clickhouse.altinity.com/shard=0

kubectl get pv | grep 0-0-0
kubectl delete pv <name of pv from previous step>


# change spec.taskID
kubectl edit chi -n gidplatform-dev gid

# wait when reconcile completed and look to events, watch to Adding table
watch -n 1 bash -c "kubectl describe chi -n gidplatform-dev gid | grep -i table"

Hubbitus · 2024-11-06T12:03:18Z

it means data in clickhouse will lost if your pod will re-scheduled in different worker node of if worker node will reinstall from scratch did you have operations similar to described above?

Yes, we use openebs and taint to node. Openebs is one of the fastest storage class. And I hope there was no pod migration to another node.

Let another try

Let's doing:

$ kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- clickhouse-client -q "SYSTEM SHUTDOWN"

$ kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SYSTEM DROP REPLICA 'chi-gid-gid-0-0'"
Received exception from server (version 24.8.6):
Code: 305. DB::Exception: Received from localhost:9000. DB::Exception: Can't drop replica: chi-gid-gid-0-0, because it's active. (TABLE_WAS_NOT_DROPPED)
(query: SYSTEM DROP REPLICA 'chi-gid-gid-0-0')
command terminated with exit code 49

I've tried several times last command with the same result.
Stopped here.

Slach · 2024-11-06T12:20:19Z

ok. change action sequence

# safe shutdown 
kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- clickhouse-client -q "SYSTEM SHUTDOWN"
sleep 0.5
# delete sts+PV+PVC to propogate schema during reconcile
kubectl delete sts -n gidplatform-dev chi-gid-gid-0-0
kubectl delete pvc -n gidplatform-dev -l clickhouse.altinity.com/chi=gid,clickhouse.altinity.com/replica=0,clickhouse.altinity.com/shard=0

kubectl get pv | grep 0-0-0
kubectl delete pv <name of pv from previous step>


# clean ZK from different node
kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SYSTEM DROP REPLICA 'chi-gid-gid-0-0'"

# change spec.taskID
kubectl edit chi -n gidplatform-dev gid

# wait when reconcile completed and look to events, watch to Adding table
watch -n 1 bash -c "kubectl describe chi -n gidplatform-dev gid | grep -i table"

Hubbitus · 2024-11-06T14:53:55Z

+ kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- clickhouse-client -q 'SYSTEM SHUTDOWN'
+ read -p 'Press Enter' T
Press Enter
+ kubectl delete sts -n gidplatform-dev chi-gid-gid-0-0
statefulset.apps "chi-gid-gid-0-0" deleted
+ read -p 'Press Enter' T
Press Enter
+ kubectl delete pvc -n gidplatform-dev -l clickhouse.altinity.com/chi=gid,clickhouse.altinity.com/replica=0,clickhouse.altinity.com/shard=0
persistentvolumeclaim "default-volume-claim-chi-gid-gid-0-0-0" deleted
+ read -p 'Press Enter' T
Press Enter
+ kubectl -n gidplatform-dev delete pv pvc-b76f4aaf-c323-4c74-bbc5-682bdb35e607
Warning: deleting cluster-scoped resources, not scoped to the provided namespace
persistentvolume "pvc-b76f4aaf-c323-4c74-bbc5-682bdb35e607" deleted
+ read -p 'Press Enter' T
Press Enter
+ kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q 'SYSTEM DROP REPLICA '\''chi-gid-gid-0-0'\'''
Received exception from server (version 24.8.6):
Code: 305. DB::Exception: Received from localhost:9000. DB::Exception: Can't drop replica: chi-gid-gid-0-0, because it's active. (TABLE_WAS_NOT_DROPPED)
(query: SYSTEM DROP REPLICA 'chi-gid-gid-0-0')
command terminated with exit code 49

Hubbitus · 2024-11-06T15:10:20Z

Another try of kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SYSTEM DROP REPLICA 'chi-gid-gid-0-0'" succeeded.

watch -n 1 -x bash -c "kubectl describe chi -n gidplatform-dev gid | grep -i table --color"

Last line appeared:

Line `Info     CreateStarted     3m8s   clickhouse-operator  Adding tables on shard/host:0/0 cluster:gid` app

Then:

  Error    CreateFailed            21s    clickhouse-operator  ERROR add tables added successfully on shard/host:0/0 cluster:gid err:Code: 415, Message: Table /clickhouse/tables/62324966-8fef-4f9e-908d-a990b880d20c/0 was suddenly removed

Reconcile done: I1106 15:02:21.382386 1 worker.go:469] worker.go:432:updateCHI():end:gidplatform-dev/gid/click-reconcile-6

Now 730 tables only on host chi-gid-gid-0-0-0 :(

Full log:
operator.2024-11-06T18:04:12+03:00.obfuscated.log

Slach · 2024-11-06T15:14:20Z

kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SELECT * FROM system.tables WHERE uuid='62324966-8fef-4f9e-908d-a990b880d20c' FORMAT Vertical"

Hubbitus · 2024-11-06T16:46:32Z

$ kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SELECT * FROM system.tables WHERE uuid='62324966-8fef-4f9e-908d-a990b880d20c' FORMAT Vertical"
Row 1:
──────
database:                      _source
name:                          victoriametrics__slo__metrics_old
uuid:                          62324966-8fef-4f9e-908d-a990b880d20c
engine:                        ReplicatedReplacingMergeTree
is_temporary:                  0
data_paths:                    ['/var/lib/clickhouse/store/623/62324966-8fef-4f9e-908d-a990b880d20c/']
metadata_path:                 /var/lib/clickhouse/store/783/7834b023-e34a-4087-bba9-98c168870af6/victoriametrics__slo__metrics_old.sql
metadata_modification_time:    2024-09-10 08:22:16
metadata_version:              0
dependencies_database:         []
dependencies_table:            []
create_table_query:            CREATE TABLE _source.victoriametrics__slo__metrics_old (`metric` LowCardinality(String), `tags` Map(LowCardinality(String), String), `ts` UInt64, `dt` Nullable(DateTime64(6, 'UTC')) MATERIALIZED toDateTime64(ts, 6, 'UTC'), `value` Nullable(String), `_row_hash_` UInt64 MATERIALIZED cityHash64((metric, tags, ts, value)), `_insert_dt_` DateTime64(6, 'UTC') DEFAULT now64(6, 'UTC') ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') PRIMARY KEY (_row_hash_, ts) ORDER BY (_row_hash_, ts) SETTINGS index_granularity = 8192
engine_full:                   ReplicatedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') PRIMARY KEY (_row_hash_, ts) ORDER BY (_row_hash_, ts) SETTINGS index_granularity = 8192
as_select:                     
partition_key:                 
sorting_key:                   _row_hash_, ts
primary_key:                   _row_hash_, ts
sampling_key:                  
storage_policy:                default
total_rows:                    .....
total_bytes:                   .....
total_bytes_uncompressed:      .....
parts:                         6
active_parts:                  6
total_marks:                   13301
lifetime_rows:                 ᴺᵁᴸᴸ
lifetime_bytes:                ᴺᵁᴸᴸ
has_own_data:                  1
loading_dependencies_database: []
loading_dependencies_table:    []
loading_dependent_database:    []
loading_dependent_table:       []

Comments stripped.

I can drop that table if it helps. But what is the reason it is not restored?

Slach · 2024-11-06T17:06:34Z

kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SELET * FROM system.zookeper WHERE path='/clickhouse/tables/62324966-8fef-4f9e-908d-a990b880d20c/{shard}/replicas' FORMAT Vertical

Hubbitus · 2024-11-06T17:15:12Z

$ kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SELECT * FROM system.zookeeper WHERE path='/clickhouse/tables/62324966-8fef-4f9e-908d-a990b880d20c/{shard}/replicas' FORMAT Vertical"

Returns nothing

Slach · 2024-11-06T17:19:13Z

try replace {shard} to '0'

kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SELET * FROM system.zookeper WHERE path='/clickhouse/tables/62324966-8fef-4f9e-908d-a990b880d20c/0/replicas' FORMAT Vertical

Hubbitus · 2024-11-07T04:00:01Z

$ kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SELECT * FROM system.zookeeper WHERE path='/clickhouse/tables/62324966-8fef-4f9e-908d-a990b880d20c/0/replicas' FORMAT Vertical"
Row 1:
──────
name:  chi-gid-gid-0-1
value: 
path:  /clickhouse/tables/62324966-8fef-4f9e-908d-a990b880d20c/0/replicas

And for 1 still empty:

$ kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- clickhouse-client -q "SELECT * FROM system.zookeeper WHERE path='/clickhouse/tables/62324966-8fef-4f9e-908d-a990b880d20c/1/replicas' FORMAT Vertical"

Slach · 2024-11-07T08:17:50Z

/1/ expected empty cause {shard} have 0 value
you have two replicas in one shard

this is really weird, why did you get
Code: 415, Message: Table /clickhouse/tables/62324966-8fef-4f9e-908d-a990b880d20c/0 was suddenly removed

ok. let's try create schema on 0-0-0 completelly manually

Create databases

kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- bash -c "clickhouse-client -q \"SELECT DISTINCT 'CREATE DATABASE IF NOT EXISTS \"' || name || '\" Engine = ' || engine_full || ';' AS create_db_query FROM cluster('all-sharded', system.databases) databases WHERE name NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA') FORMAT TSVRaw\" | clickhouse-client -mn --echo"

Create MergeTree tables

kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- bash -c " clickhouse-client -q \"SELECT DISTINCT replaceRegexpOne(create_table_query, 'CREATE (TABLE|VIEW|MATERIALIZED VIEW|DICTIONARY|LIVE VIEW|WINDOW VIEW)', 'CREATE \\1 IF NOT EXISTS') || ';' AS q FROM cluster('all-sharded', system.tables) WHERE database NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA') AND create_table_query != '' AND name NOT LIKE '.inner.%' AND name NOT LIKE '.inner_id.%' AND engine LIKE '%MergeTree%' SETTINGS show_table_uuid_in_table_create_query_if_not_nil=1 FORMAT TSVRaw" | clickhouse-client -mn --echo"

Create Distributed tables

kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- bash -c " clickhouse-client -q \"SELECT DISTINCT replaceRegexpOne(create_table_query, 'CREATE (TABLE|VIEW|MATERIALIZED VIEW|DICTIONARY|LIVE VIEW|WINDOW VIEW)', 'CREATE \\1 IF NOT EXISTS') || ';' AS q FROM cluster('all-sharded', system.tables) WHERE database NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA') AND create_table_query != '' AND name NOT LIKE '.inner.%' AND name NOT LIKE '.inner_id.%' AND engine LIKE '%Distributed%' SETTINGS show_table_uuid_in_table_create_query_if_not_nil=1 FORMAT TSVRaw" | clickhouse-client -mn --echo"

Create Other tables, MV, Dictionaries

kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- bash -c " clickhouse-client -q \"SELECT DISTINCT replaceRegexpOne(create_table_query, 'CREATE (TABLE|VIEW|MATERIALIZED VIEW|DICTIONARY|LIVE VIEW|WINDOW VIEW)', 'CREATE \\1 IF NOT EXISTS') || ';' AS q FROM cluster('all-sharded', system.tables) WHERE database NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA') AND create_table_query != '' AND name NOT LIKE '.inner.%' AND name NOT LIKE '.inner_id.%' AND engine NOT LIKE '%Distributed%' AND engine NOT LIKE '%MergeTree%' SETTINGS show_table_uuid_in_table_create_query_if_not_nil=1 FORMAT TSVRaw" | clickhouse-client -mn --echo"

Hubbitus · 2024-11-07T09:30:26Z

@Slach,
I've tried first:

+ kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- bash -c 'clickhouse-client -q "SELECT DISTINCT '\''CREATE DATABASE IF NOT EXISTS "'\'' || name || '\''" Engine = '\'' || engine_full || '\'';'\'' AS create_db_query FROM cluster('\''all-sharded'\'', system.databases) databases WHERE name NOT IN ('\''system'\'', '\''information_schema'\'', '\''INFORMATION_SCHEMA'\'') FORMAT TSVRaw" | clickhouse-client -mn --echo'
Code: 62. DB::Exception: Syntax error: failed at position 32 ('||'): || name ||  Engine = Atomic;
. Expected identifier. (SYNTAX_ERROR)

command terminated with exit code 62

My attempt to fix:

$ kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- bash -c "clickhouse-client -q \"SELECT DISTINCT 'CREATE DATABASE IF NOT EXISTS \"' || name || '\" Engine = ' || engine_full || ';' AS create_db_query FROM cluster('all-sharded', system.databases) databases WHERE name NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA') FORMAT TSVRaw\" "
CREATE DATABASE IF NOT EXISTS  || name ||  Engine = Atomic;

Correcting quotes probably will:

$ kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- bash -c "clickhouse-client -q \"SELECT DISTINCT 'CREATE DATABASE IF NOT EXISTS ' || name || ' Engine = ' || engine_full || ';' AS create_db_query FROM cluster('all-sharded', system.databases) databases WHERE name NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA') FORMAT TSVRaw\" "
CREATE DATABASE IF NOT EXISTS _loopback Engine = Atomic;
CREATE DATABASE IF NOT EXISTS _raw Engine = Atomic;
CREATE DATABASE IF NOT EXISTS _service_ Engine = Atomic;
CREATE DATABASE IF NOT EXISTS _source Engine = Atomic;
CREATE DATABASE IF NOT EXISTS _temp_ Engine = Atomic;
CREATE DATABASE IF NOT EXISTS _temporaldb Engine = Atomic;
CREATE DATABASE IF NOT EXISTS cdc Engine = Atomic;
CREATE DATABASE IF NOT EXISTS datamart Engine = Atomic;
CREATE DATABASE IF NOT EXISTS default Engine = Atomic;
CREATE DATABASE IF NOT EXISTS sandbox Engine = Atomic;
CREATE DATABASE IF NOT EXISTS tmp_incr Engine = Atomic;

That without last part | clickhouse-client -mn --echo.

But what the reason dong so on same 0 node? Dumping from it and applying there. So clause IF NOT EXISTS will lead to do nothing.

Is it intended to dump structure from 1 (working) node and apply to 0 (broken)?

Slach · 2024-11-07T09:41:31Z

Is it intended to dump structure from 1 (working) node and apply to 0 (broken)?

yes

Slach · 2024-11-07T09:53:26Z

let's change approach
run shell in 0-0-0

kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- bash

Datbases

clickhouse-client -q "SELECT DISTINCT 'CREATE DATABASE IF NOT EXISTS \"' || name || '\" Engine = ' || engine_full || ';' AS create_db_query FROM cluster('all-sharded', system.databases) databases WHERE name NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA') FORMAT TSVRaw" | clickhouse-client -mn --echo

Create MergeTree tables

clickhouse-client -q "SELECT DISTINCT replaceRegexpOne(create_table_query, 'CREATE (TABLE|VIEW|MATERIALIZED VIEW|DICTIONARY|LIVE VIEW|WINDOW VIEW)', 'CREATE \\1 IF NOT EXISTS') || ';' AS q FROM cluster('all-sharded', system.tables) WHERE database NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA') AND create_table_query != '' AND name NOT LIKE '.inner.%' AND name NOT LIKE '.inner_id.%' AND engine LIKE '%MergeTree%' SETTINGS show_table_uuid_in_table_create_query_if_not_nil=1 FORMAT TSVRaw" | clickhouse-client -mn --echo

Create Distributed tables

clickhouse-client -q "SELECT DISTINCT replaceRegexpOne(create_table_query, 'CREATE (TABLE|VIEW|MATERIALIZED VIEW|DICTIONARY|LIVE VIEW|WINDOW VIEW)', 'CREATE \\1 IF NOT EXISTS') || ';' AS q FROM cluster('all-sharded', system.tables) WHERE database NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA') AND create_table_query != '' AND name NOT LIKE '.inner.%' AND name NOT LIKE '.inner_id.%' AND engine LIKE '%Distributed%' SETTINGS show_table_uuid_in_table_create_query_if_not_nil=1 FORMAT TSVRaw" | clickhouse-client -mn --echo

Create Other tables, MV, Dictionaries

clickhouse-client -q "SELECT DISTINCT replaceRegexpOne(create_table_query, 'CREATE (TABLE|VIEW|MATERIALIZED VIEW|DICTIONARY|LIVE VIEW|WINDOW VIEW)', 'CREATE \\1 IF NOT EXISTS') || ';' AS q FROM cluster('all-sharded', system.tables) WHERE database NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA') AND create_table_query != '' AND name NOT LIKE '.inner.%' AND name NOT LIKE '.inner_id.%' AND engine NOT LIKE '%Distributed%' AND engine NOT LIKE '%MergeTree%' SETTINGS show_table_uuid_in_table_create_query_if_not_nil=1 FORMAT TSVRaw" | clickhouse-client -mn --echo

Hubbitus · 2024-11-07T11:49:02Z

@Slach,

Is it intended to dump structure from 1 (working) node and apply to 0 (broken)?

yes

Then it probably should be like:

kubectl exec -n gidplatform-dev chi-gid-gid-0-1-0 -- bash -c "clickhouse-client -q \"SELECT DISTINCT 'CREATE DATABASE IF NOT EXISTS ' || name || ' Engine = ' || engine_full || ';' AS create_db_query FROM cluster('all-sharded', system.databases) databases WHERE name NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA') FORMAT TSVRaw\" " \
        | kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- bash -c "clickhouse-client -mn --echo"

But still, if there was not DROP DATABASE followed by CREATE DATABASE, it will do nothing. Because, how we saw in previous comment, there are already databases present.

But what the purpose of such operations?
Shouldn't it be completed automatically by operator? I hope it should.
Or we just want to confirm what manual way of copying objects works?

let's change approach run shell in 0-0-0

kubectl exec -n gidplatform-dev chi-gid-gid-0-0-0 -- bash

Datbases

clickhouse-client -q "SELECT DISTINCT 'CREATE DATABASE IF NOT EXISTS \"' || name || '\" Engine = ' || engine_full || ';' AS create_db_query FROM cluster('all-sharded', system.databases) databases WHERE name NOT IN ('system', 'information_schema', 'INFORMATION_SCHEMA') FORMAT TSVRaw" | clickhouse-client -mn --echo

That still will be executed within the single 0 node.

Slach · 2024-11-08T13:53:57Z

Then it probably should be like:

no, i provided command with pipeline inside bash -c
last message #1455 (comment)
described command which just need to executed in interactive shell

Or we just want to confirm what manual way of copying objects works?

Yes, i'm looking into clickhouse-operator source code and trying to execute schemer.go commands manually

Hubbitus · 2024-11-08T14:12:20Z

no, i provided command with pipeline inside bash -c
last message #1455 (comment)
described command which just need to executed in interactive shell

Yes, they piped, but inside an interactive shell of 0 node. There are no cross-nodes connections.

Slach · 2024-11-08T15:40:47Z

look to FROM cluster('all-sharded', system.tables)

Slach assigned sunsingerus Nov 3, 2024

Re-Creating node from scratch does not copy tables for the Postgres and Kafka engines #1455

Re-Creating node from scratch does not copy tables for the Postgres and Kafka engines #1455

Comments

Hubbitus commented Jul 12, 2024 • edited Loading

alex-zaitsev commented Jul 18, 2024

Hubbitus commented Jul 24, 2024 • edited Loading

alex-zaitsev commented Jul 30, 2024

Hubbitus commented Jul 31, 2024 • edited Loading

alex-zaitsev commented Aug 15, 2024

Hubbitus commented Sep 4, 2024 • edited Loading

alex-zaitsev commented Sep 20, 2024 • edited Loading

Hubbitus commented Sep 21, 2024 • edited Loading

Hubbitus commented Sep 29, 2024 • edited Loading

Hubbitus commented Oct 13, 2024

Slach commented Oct 14, 2024 • edited Loading

Hubbitus commented Oct 15, 2024

Slach commented Oct 17, 2024 • edited Loading

Hubbitus commented Oct 19, 2024

Slach commented Oct 19, 2024

Hubbitus commented Oct 21, 2024

Slach commented Oct 21, 2024

Hubbitus commented Oct 23, 2024

Slach commented Oct 24, 2024

Hubbitus commented Nov 2, 2024

Slach commented Nov 2, 2024

Hubbitus commented Nov 2, 2024

Slach commented Nov 2, 2024

Hubbitus commented Nov 2, 2024

Slach commented Nov 3, 2024

Slach commented Nov 3, 2024

Slach commented Nov 3, 2024

Hubbitus commented Nov 3, 2024 • edited Loading

Slach commented Nov 3, 2024

Hubbitus commented Nov 3, 2024

Slach commented Nov 3, 2024

Hubbitus commented Nov 3, 2024

Slach commented Nov 4, 2024

Slach commented Nov 4, 2024

Hubbitus commented Nov 4, 2024 • edited Loading

Slach commented Nov 4, 2024

Hubbitus commented Nov 5, 2024 • edited Loading

Slach commented Nov 6, 2024

Hubbitus commented Nov 6, 2024 • edited Loading

Slach commented Nov 6, 2024

Hubbitus commented Nov 6, 2024

Hubbitus commented Nov 6, 2024

Slach commented Nov 6, 2024

Hubbitus commented Nov 6, 2024

Slach commented Nov 6, 2024

Hubbitus commented Nov 6, 2024

Slach commented Nov 6, 2024

Hubbitus commented Nov 7, 2024

Slach commented Nov 7, 2024

Create databases

Create MergeTree tables

Create Distributed tables

Create Other tables, MV, Dictionaries

Hubbitus commented Nov 7, 2024

Slach commented Nov 7, 2024 • edited Loading

Slach commented Nov 7, 2024

Datbases

Create MergeTree tables

Create Distributed tables

Create Other tables, MV, Dictionaries

Hubbitus commented Nov 7, 2024 • edited Loading

Datbases

Slach commented Nov 8, 2024

Hubbitus commented Nov 8, 2024

Slach commented Nov 8, 2024

Hubbitus commented Jul 12, 2024 •

edited

Loading

Hubbitus commented Jul 24, 2024 •

edited

Loading

Hubbitus commented Jul 31, 2024 •

edited

Loading

Hubbitus commented Sep 4, 2024 •

edited

Loading

alex-zaitsev commented Sep 20, 2024 •

edited

Loading

Hubbitus commented Sep 21, 2024 •

edited

Loading

Hubbitus commented Sep 29, 2024 •

edited

Loading

Slach commented Oct 14, 2024 •

edited

Loading

Slach commented Oct 17, 2024 •

edited

Loading

Hubbitus commented Nov 3, 2024 •

edited

Loading

Hubbitus commented Nov 4, 2024 •

edited

Loading

Hubbitus commented Nov 5, 2024 •

edited

Loading

Hubbitus commented Nov 6, 2024 •

edited

Loading

Slach commented Nov 7, 2024 •

edited

Loading

Hubbitus commented Nov 7, 2024 •

edited

Loading