Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot using k8s external ip address in the remote_servers.xml configuration? #1551

Open
chaos827 opened this issue Nov 1, 2024 · 4 comments

Comments

@chaos827
Copy link

chaos827 commented Nov 1, 2024

I created a 2 * 2 ClickHouse DB (2 Shards and each Shard has 2 replicas) using ClickHouse-Operator and azure Kubernetes(AKS), meanwhile I created load balance services for each replica (each pod has its own load balance service and unique external IP), it works well.
After that I updated the remote_servers xml file using the external ip instead of hostname, in this way the distribute query (i.e: create database TestDB on CLUSTER '{cluster}' ENGINE = Atomic) is not worked on the pod which with external ip, also the ReplicatedMergreeTree is not synced the data in the same pod, but the pod work well when I used HostName or Pod Ip, below is my remote_servers configuration in the yaml,

config.d/remote_servers.xml:

<remote_servers>



<internal_replication>true</internal_replication>

chi-p1-testcluster-0-0-0.chi-p1-testcluster-0-0.clickhouse1.svc.cluster.local
9000
test
test123
0


chi-p1-testcluster-0-1-0.chi-p1-testcluster-0-1.clickhouse1.svc.cluster.local
9000
test
test123
0

              </shard>
              <shard>
                  <internal_replication>true</internal_replication>
                  <replica>
                      <host>chi-p1-testcluster-1-0-0.chi-p1-testcluster-1-0.clickhouse1.svc.cluster.local</host>
                      <port>9000</port>
                      <user>test</user>
                      <password>test123</password>
                      <secure>0</secure>
                  </replica>
                  <replica>
                      <host>10.224.0.192</host>
                      <port>9000</port>
                      <user>test</user>
                      <password>test123</password>
                      <secure>0</secure>
                  </replica>
                
              </shard>
          </testcluster>
          <!-- Autogenerated clusters -->
          <all-replicated>
              <shard>
                  <internal_replication>true</internal_replication>
                  <replica>
                      <host>chi-p1-testcluster-0-0-0.chi-p1-testcluster-0-0.clickhouse1.svc.cluster.local</host>
                      <port>9000</port>
                      <user>test</user>
                      <password>test123</password>
                      <secure>0</secure>
                  </replica>
                  <replica>
                      <host>chi-p1-testcluster-0-1-0.chi-p1-testcluster-0-1.clickhouse1.svc.cluster.local</host>
                      <port>9000</port>
                      <user>test</user>
                      <password>test123</password>
                      <secure>0</secure>
                  </replica>
                  <replica>
                      <host>chi-p1-testcluster-1-0-0.chi-p1-testcluster-1-0.clickhouse1.svc.cluster.local</host>
                      <port>9000</port>
                      <user>test</user>
                      <password>test123</password>
                      <secure>0</secure>
                  </replica>
                  <replica>
                      <host>10.224.0.192</host>
                      <port>9000</port>
                      <user>test</user>
                      <password>test123</password>
                      <secure>0</secure>
                  </replica>
                 
              </shard>
          </all-replicated>
          <all-sharded>
              <shard>
                  <internal_replication>false</internal_replication>
                  <replica>
                      <host>chi-p1-testcluster-0-0-0.chi-p1-testcluster-0-0.clickhouse1.svc.cluster.local</host>
                      <port>9000</port>
                      <user>test</user>
                      <password>test123</password>
                      <secure>0</secure>
                  </replica>
              </shard>
              <shard>
                  <internal_replication>false</internal_replication>
                  <replica>
                      <host>chi-p1-testcluster-0-1-0.chi-p1-testcluster-0-1.clickhouse1.svc.cluster.local</host>
                      <port>9000</port>
                      <user>test</user>
                      <password>test123</password>
                      <secure>0</secure>
                  </replica>
              </shard>
              <shard>
                  <internal_replication>false</internal_replication>
                  <replica>
                      <host>chi-p1-testcluster-1-0-0.chi-p1-testcluster-1-0.clickhouse1.svc.cluster.local</host>
                      <port>9000</port>
                      <user>test</user>
                      <password>test123</password>
                      <secure>0</secure>
                  </replica>
              </shard>
              <shard>
                  <internal_replication>false</internal_replication>
                  <replica>
                      <host>10.224.0.192</host>
                      <port>9000</port>
                      <user>test</user>
                      <password>test123</password>
                      <secure>0</secure>
                  </replica>
              </shard>                  
              
          </all-sharded>
      </remote_servers>
    </yandex>

I did this test beacuse I want to set up the ClickHouse in the different data center (replica1 in primary and replica2 in the geolocation), so I have to split the ClickHouse in the two AKS, and using external ip to communicate, but I do not undershand why my yaml is not work, does someone know the root causes?
many thanks!

@UnamedRus
Copy link

It's 2 different problems.
Replication doesn't care about remote_servers configuration, you need to check/fix interserver_http_host parameter instead. (and make sure that port 9009 is exposed)

https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings#interserver-http-host

@chaos827
Copy link
Author

chaos827 commented Nov 4, 2024

hi @UnamedRus, thank you for sharing the suggestion. I updated my yaml to config the interserver_http_host parameter, however I tried serval ways, but the replica is still not work, this my new part in the yaml
config.d/interserver_http_host.xml:

<interserver_http_host>10.225.0.10(load balancer external ip)</interserver_http_host>
<interserver_http_post>9009</interserver_http_post>

<tcp_port>9000</tcp_port>
<interserver_http_credentials>
test
test123
</interserver_http_credentials>

also I double confirmed the port 9009 already exposed,
and I found some logs, seems it is related to dn servers

2024.11.04 10:43:21.593349 [ 764 ] {} HTTP-Session: 3bf65257-0169-4431-a753-17d4c7c79aad Logout, user_id: 78dfa8ab-fce4-cf99-3aa4-eee47478eda1
2024.11.04 10:43:21.658997 [ 217 ] {} DNSResolver: Cannot resolve host (chi-p1-testcluster-1-1-0.chi-p1-testcluster-1-1.clickhouse1.svc.cluster.local), error 0: Host not found.
2024.11.04 10:43:22.096816 [ 217 ] {} DNSResolver: Cannot resolve host (chi-p1bcp-testcluster-0-0), error 0: Host not found.
2024.11.04 10:43:22.321522 [ 217 ] {} DNSResolver: Cannot resolve host (chi-p1-testcluster-1-0-0.chi-p1-testcluster-1-0.clickhouse1.svc.cluster.local), error 0: Host not found.
2024.11.04 10:43:22.547227 [ 217 ] {} DNSResolver: Cannot resolve host (chi-p1bcp-testcluster-1-0), error 0: Host not found.
2024.11.04 10:43:22.771705 [ 217 ] {} DNSResolver: Cannot resolve host (chi-p1-testcluster-1-0), error 0: Host not found.
2024.11.04 10:43:22.771960 [ 217 ] {} DNSResolver: Cached hosts not found: chi-p1bcp-testcluster-1-1, chi-p1bcp-testcluster-0-1, chi-p1-testcluster-1-1, chi-p1-testcluster-1-1-0.chi-p1-testcluster-1-1.clickhouse1.svc.cluster.local, chi-p1bcp-testcluster-0-0, chi-p1-testcluster-1-0-0.chi-p1-testcluster-1-0.clickhouse1.svc.cluster.local, chi-p1bcp-testcluster-1-0, chi-p1-testcluster-1-0
2024.11.04 10:43:22.772004 [ 217 ] {} DNSResolver: Updated DNS cache
2024.11.04 10:43:22.772928 [ 763 ] {4ee922d3-20de-4b1a-abd8-a3494bbfcff8} DynamicQueryHandler: Done processing query
2024.11.04 10:43:22.772973 [ 763 ] {} HTTP-Session: fabc82f3-7a04-49d7-b756-14bc9032bafb Logout, user_id: 78dfa8ab-fce4-cf99-3aa4-eee47478eda1
2024.11.04 10:43:22.796790 [ 763 ] {} HTTP-Session: bdc270da-96c7-4111-86fa-9725e5e8f435 Authenticating user 'clickhouse_operator' from 10.224.0.13:51188
2024.11.04 10:43:22.796845 [ 763 ] {} HTTP-Session: bdc270da-96c7-4111-86fa-9725e5e8f435 Authenticated with global context as user 78dfa8ab-fce4-cf99-3aa4-eee47478eda1
2024.11.04 10:43:22.796860 [ 763 ] {} HTTP-Session: bdc270da-96c7-4111-86fa-9725e5e8f435 Creating session context with user_id: 78dfa8ab-fce4-cf99-3aa4-eee47478eda1
2024.11.04 10:43:22.797104 [ 763 ] {8d905c1d-9b68-449d-8fd7-63739d1c4acd} executeQuery: (from 10.224.0.13:51188, user: clickhouse_operator) SYSTEM DROP DNS CACHE (stage: Complete)
2024.11.04 10:43:22.798569 [ 763 ] {8d905c1d-9b68-449d-8fd7-63739d1c4acd} DynamicQueryHandler: Done processing query

@alex-zaitsev
Copy link
Member

@chaos827 , why do you need external IPs for replication? Are you sure it is routable at all?

One thing to try is to use FQDN for replicas, maybe it will help, but what you are doing sounds strange in general

spec:
  defaults:
    replicasUseFQDN: "yes"

@chaos827
Copy link
Author

chaos827 commented Nov 6, 2024

yes it is weird, because I want to set up ClickHouse cross region (cross AKS), the pod IP is dynamic, so I have to create Load Balance service for each replica.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants