You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When removing replicas from a cluster that are not the ones that were added most recently the operator tries to remove the wrong replicas from the cluster (using SYSTEM DROP REPLICA '...'). This only affects the replicas in ch and not the STS, Pods and other k8s resources as these are remove correctly.
So given a cluster config looks like:
clusters:
- name: replicatedlayout:
shards:
- name: "0"replicas:
- name: "0-0"# Old
- name: "0-1"# Old
- name: "0-2"# Old# New replicas on dedicated nodes
- name: "0-0-dedicated"templates:
podTemplate: clickhouse-dedicated
- name: "0-1-dedicated"templates:
podTemplate: clickhouse-dedicated
- name: "0-2-dedicated"templates:
podTemplate: clickhouse-dedicated
When we now try to remove the old replicas "0-0", "0-1" and "0-2" the operator will try to remove the replicas "0-0-dedicated" etc. However it fails in doing so (E1024 08:42:53.714011 1 connection.go:194] Exec():FAILED Exec(http://operator/:***@chi-analytics-replicated-0-0.clickhouse.svc.cluster.local:8123/) doRequest: transport failed to send a request to ClickHouse: dial tcp: lookup chi-analytics-replicated-0-0.clickhouse.svc.cluster.local on 172.17.0.10:53: no such host for SQL: SYSTEM DROP REPLICA 'chi-analytics-replicated-0-0-dedicated') because it tries to execute the SQL statements on the already removed pods.
The same behaviour can be observed when adding replicas: If we try to add a new replicas by not appending it to the end of the replicas array there will be no schema migration.
Thanks @jannikbend for posting. This behavior is at least surprising, I would be interested to hear from @Slach and @sunsingerus what they think about the case.
Description
When removing replicas from a cluster that are not the ones that were added most recently the operator tries to remove the wrong replicas from the cluster (using SYSTEM DROP REPLICA '...'). This only affects the replicas in ch and not the STS, Pods and other k8s resources as these are remove correctly.
So given a cluster config looks like:
When we now try to remove the old replicas "0-0", "0-1" and "0-2" the operator will try to remove the replicas "0-0-dedicated" etc. However it fails in doing so (E1024 08:42:53.714011 1 connection.go:194] Exec():FAILED Exec(http://operator/:***@chi-analytics-replicated-0-0.clickhouse.svc.cluster.local:8123/) doRequest: transport failed to send a request to ClickHouse: dial tcp: lookup chi-analytics-replicated-0-0.clickhouse.svc.cluster.local on 172.17.0.10:53: no such host for SQL: SYSTEM DROP REPLICA 'chi-analytics-replicated-0-0-dedicated') because it tries to execute the SQL statements on the already removed pods.
The same behaviour can be observed when adding replicas: If we try to add a new replicas by not appending it to the end of the replicas array there will be no schema migration.
See this related slack post as well: https://altinitydbworkspace.slack.com/archives/C02K1MWEK2L/p1729859400145679
Reproduction
The text was updated successfully, but these errors were encountered: