Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using bridge network for nodes #484

Closed

Conversation

tao12345666333
Copy link
Member

@tao12345666333 tao12345666333 commented May 5, 2019

ref: #408 (comment) and #408 (comment)

we can using bridge network for nodes. after this we can using hostname instead of IPs.
It will push #408 forward.

things need todo:

  • create bridge network
  • create nodes using bridge netwrok

- [ ] using hostname instead of IPs (config files.)

  • load-balancer config

  • delete network while deleting cluster.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 5, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tao12345666333
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: bentheelder

If they are not already assigned, you can assign the PR to them by writing /assign @bentheelder in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 5, 2019
@BenTheElder
Copy link
Member

#273 (comment)

@BenTheElder
Copy link
Member

The custom bridge network's magical hostname resolution is causing coreDNS to detect a loop and fail.

@tao12345666333
Copy link
Member Author

It seems that I missed some useful information. 😯

@neolit123
Copy link
Member

The custom bridge network's magical hostname resolution is causing coreDNS to detect a loop and fail.

hm, the following is not an option:

  • editing the coredns Corefile to modify the forward or disable the coredns loop detection.

we need to think of a way to avoid the loop.
docker dns configuration flags:
https://docs.docker.com/v17.09/engine/userguide/networking/default_network/configure-dns/

/cc
/cc @aojea

@aojea
Copy link
Contributor

aojea commented May 5, 2019

The DNS loop seems caused because the docker DNS uses always the address 127.0.0.11 as quoted from the docker docs

Note: The DNS server is always at 127.0.0.11

and that´s the address configured on the node´s resolv.conf


nameserver 127.0.0.11
options ndots:0

CoreDNS documentation explains why such addresses causes a loop https://github.com/coredns/coredns/blob/master/plugin/loop/README.md

A common cause of forwarding loops in Kubernetes clusters is an interaction with a local DNS cache on the host node (e.g. systemd-resolved). For example, in certain configurations systemd-resolved will put the loopback address 127.0.0.53 as a nameserver into /etc/resolv.conf. Kubernetes (via kubelet) by default will pass this /etc/resolv.conf file to all Pods using the default dnsPolicy rendering them unable to make DNS lookups (this includes CoreDNS Pods). CoreDNS uses this /etc/resolv.conf as a list of upstreams to forward requests to. Since it contains a loopback address, CoreDNS ends up forwarding requests to itself.

It offers 3 solutions, but personally I only see feasible the 1st want mentioned modifying the kubelet yaml. Let me try to see what´s needed to implement it and come back to you.

In the other hand, I don´t know if creating a different bridge per cluster will break some functionality, I´m especially concerned about the multicluster/federation scenarios. I think it will be safer first to create a network for kind and then see if we need to move to a network per cluster

@aojea
Copy link
Contributor

aojea commented May 5, 2019

@tao12345666333 in order to keep backward compatibility and avoid the DNS crash we should:

  1. Copy the resolv.conf from the host to /kind/resolv.conf per example filtering all the loopback addresses (this is what Docker is doing when using the default bridge https://docs.docker.com/v17.09/engine/userguide/networking/default_network/configure-dns/)

When creating the container’s /etc/resolv.conf, the daemon filters out all localhost IP address nameserver entries from the host’s original file.

  1. Tell kubelet to use /kind/resolv.conf for Pods.
    Seems kubeadm has a flag to configure this,,

@tao12345666333
Copy link
Member Author

@aojea Thank you for the information, I will try it later.

@BenTheElder
Copy link
Member

does docker only rewrite etc/resolv.conf on container creation or continually? if the former, then we can just overwrite that?

otherwise we will have inconsistent DNS for system components.

@aojea
Copy link
Contributor

aojea commented May 6, 2019

does docker only rewrite etc/resolv.conf on container creation or continually? if the former, then we can just overwrite that?

otherwise we will have inconsistent DNS for system components.

if you overwrite the resolv.conf of the nodes you lose the Docker DNS and the host discovery. I mean, there is no reason to move to user defined bridges

@BenTheElder
Copy link
Member

BenTheElder commented May 6, 2019 via email

@tao12345666333
Copy link
Member Author

This fix can work at my side.

pkg/cluster/nodes/create.go Outdated Show resolved Hide resolved
@aojea
Copy link
Contributor

aojea commented May 6, 2019

It seems I misunderstood the use of the resolv-conf flag in kubelet

May 06 11:05:34 kind-worker kubelet[200]: E0506 11:05:34.447977 200 kuberuntime_manager.go:693] createPodSandbox for pod "weave-net-khpnb_kube-system(0ebd9f5c-6fec-11e9-8010-0242ac120002)" failed: open /kind/resolv.conf: no such file or directory

The pods are not able to boot because they can´t find the new resolv.conf file. I was thinking that the file was copied inside each pod
The resolv.conf file has to be copied in all nodes, currently is not available on the worker nodes

pkg/cluster/nodes/util.go Outdated Show resolved Hide resolved
@tao12345666333
Copy link
Member Author

/test pull-kind-conformance-parallel

@BenTheElder
Copy link
Member

In the other hand, I don´t know if creating a different bridge per cluster will break some functionality, I´m especially concerned about the multicluster/federation scenarios. I think it will be safer first to create a network for kind and then see if we need to move to a network per cluster

If you look at my strawman proposal, the idea is to make it configurable so federation can configure the same bridge for all.

@@ -237,6 +239,7 @@ evictionHard:
nodefs.available: "0%"
nodefs.inodesFree: "0%"
imagefs.available: "0%"
resolvConf: /kind/resolv.conf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the idea that containerd and debugging tools will have a different resolv conf than kubelet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you said

we could selectively merge the resolv.conf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After considering it, I think the current practice is relatively simple. If we do resolv.conf merge, we need a lot of testing. Why don't we put it later? 🤔

Keep this practice for a while, then change it after the verification is passed.

I hope we can move forward. ❤️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means other components have an inconsistent resolv conf and we're not actually solving the issue. if we're going to use hostnames we need kubelet and everything else to have the same lookup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically having a different resolv.conf doesn't mean that have a different lookup, just that the pods will not be able to resolv the nodes names, the other requests will use the same dns server, however I agree that reality is always different and we can be broken because docker changes on the DNS behaviour, that will not be surprising 😅

Docker daemon runs an embedded DNS server which provides DNS resolution among containers connected to the same user-defined network, so that these containers can resolve container names to IP addresses. If the embedded DNS server is unable to resolve the request, it will be forwarded to any external DNS servers configured for the container. To facilitate this when the container is created, only the embedded DNS server reachable at 127.0.0.11 will be listed in the container’s resolv.conf file.

I agree with tao here, I think that we should left until last the DNS exercise

Copy link
Member

@neolit123 neolit123 May 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on merging the resolv.confs if feasable.

@tao12345666333
Copy link
Member Author

The most basic features of this PR expectation have been achieved. Includes:

  • create bridge network
  • create nodes using bridge netwrok
  • load-balancer config
  • delete network while deleting cluster.

By using the bridge network, we can use Docker's embedded DNS to help push #408 forward.

After this PR, we only need to modify /etc/kubernetes/pki/admin.conf in order to deal with #408. We no longer need to regenerate the certificate because the current certificate is still available(with this PR).

i.e.

root@moe-control-plane3:/etc/kubernetes# openssl x509 -in pki/apiserver.crt -noout -text                        
Certificate:                                                                                                     
    Data:                                                                                        
        Version: 3 (0x2)                                                                                         
        Serial Number: 4829775455184542867 (0x4306cf2fef2cdc93)                                                 
        Signature Algorithm: sha256WithRSAEncryption                                                             
        Issuer: CN = kubernetes                                    
        Validity                                                                                                 
            Not Before: May  8 13:55:37 2019 GMT                             
            Not After : May  7 13:56:57 2020 GMT                                                                 
        Subject: CN = kube-apiserver                                              
        Subject Public Key Info:                                                                                 
            Public Key Algorithm: rsaEncryption                                           
                RSA Public-Key: (2048 bit)                                                                      
                Modulus:                                                                                        
                    00:c3:cb:9e:9f:e5:d4:bf:c7:b1:a8:77:98:95:e5:  
                    (ignore)
                    fb:66:2a:d0:ab:b1:23:30:25:39:e5:91:91:6a:f6:
                    a9:15
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Subject Alternative Name:
                DNS:moe-control-plane3, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:k
ubernetes.default.svc.cluster.local, DNS:localhost, IP Address:10.96.0.1, IP Address:172.28.0.6, IP Address:172.2
8.0.2
    Signature Algorithm: sha256WithRSAEncryption
         70:65:07:9c:54:50:24:9f:59:59:8b:de:4f:60:c9:4e:47:5f:
         (ignore)

I want to know your thoughts. @BenTheElder @aojea @neolit123 Thanks.

@aojea
Copy link
Contributor

aojea commented May 8, 2019

Ben created a proposal, I think that only the labeling networks thing is missing

#273 (comment)

Strawman:

if the specified network exists, we use it without creating
if it doesn't exist, we create it, and label it as belonging to the cluster
on cluster delete, we list networks labeled with the cluster, and delete only those (so not just whatever the containers use, only if we labeled it)
we name this field / functionality somehow such that it is clear that this feature is docker specific, leaving room for podman etc. in the immediate future xref #154.

There was some controversy regarding the different resolv.conf for nodes and clusters.
If we want to use the same the resolv.conf for everything I guess that the only options is rewriting it in the nodes and add the hostnames to /etc/hosts manually ... but don't know if docker is doing some magic behind the scenes on those files

@tao12345666333 tao12345666333 force-pushed the using-network branch 4 times, most recently from 95fa838 to eda987d Compare June 5, 2019 16:56
@tao12345666333
Copy link
Member Author

/test pull-kind-conformance-parallel

@tao12345666333
Copy link
Member Author

It seems that the docker related commands cannot be tested in the CI environment, so I deleted the test case for the docker network.

@tao12345666333
Copy link
Member Author

I think we should do something to push this PR forward.

/assign @BenTheElder

}

// filter the loopback addresses
re := regexp.MustCompile("(?m)[\r\n]+^.*((127.([0-9]{1,3}.){2}[0-9]{1,3})|(::1)).*$")
Copy link
Member

@BenTheElder BenTheElder Jun 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't even be necessary in the hosts's resolv.conf ?

// DeleteNetwork delete the special network
// only when the network was created by kind
func DeleteNetwork(networkName string) error {
if isNetworkCreatedByKind(networkName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is orthogonal to the functionality of delete network. this package should be kind agnostic. we don't do this for delete containers

"docker", "network",
"create",
"--driver=bridge",
"--label="+fmt.Sprintf("%s=%s", constants.ClusterLabelKey, networkName),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this package should not be aware of the cluster label key. this package just implements docker functionality

}

// isNetworkCreatedByKind checks if it was created by kind
func isNetworkCreatedByKind(networkName string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should live somewhere in cluster/

@@ -181,6 +182,7 @@ evictionHard:
nodefs.available: "0%"
nodefs.inodesFree: "0%"
imagefs.available: "0%"
resolvConf: /kind/resolv.conf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not being done for all api versions

@@ -193,16 +214,16 @@ func nodesToCreate(cfg *config.Cluster, clusterName string) []nodeSpec {
}

// TODO(bentheelder): remove network in favor of []cri.PortMapping when that is in
func (d *nodeSpec) Create(clusterLabel string) (node *nodes.Node, err error) {
func (d *nodeSpec) Create(networkName, clusterLabel string) (node *nodes.Node, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't this in the nodeSpec?

@@ -102,4 +102,7 @@ type Networking struct {
// If DisableDefaultCNI is true, kind will not install the default CNI setup.
// Instead the user should install their own CNI after creating the cluster.
DisableDefaultCNI bool
// BridgeName is the bridge network name.
// Defaults to same as cluster name.
BridgeName string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this config field name is not suitable, we need to separate docker specific functionality and make that clear)

@BenTheElder
Copy link
Member

It seems that the docker related commands cannot be tested in the CI environment, so I deleted the test case for the docker network.

they absolutely can, but the unit tests should be unit tests, not invoking docker commands (!)

this is why we need #528

I think we should do something to push this PR forward.

@tao12345666333 unfortunately I need to prioritize some other fixes etc related to testing Kubernetes (EG the HA fixes), and we have many open PRs.. This is pretty far into P3 https://kind.sigs.k8s.io/docs/contributing/project-scope/

I'm still not convinced that the DNS approach is a good one to solve this problem and may introduce more problems, and the field name needs thought / is not in line with the proposal. Left some more specific review feedback.

@k8s-ci-robot
Copy link
Contributor

@tao12345666333: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kind-conformance-parallel 48977ec link /test pull-kind-conformance-parallel
pull-kind-conformance-parallel-1-15 48977ec link /test pull-kind-conformance-parallel-1-15
pull-kind-unit 48977ec link /test pull-kind-unit
pull-kind-e2e-kubernetes 48977ec link /test pull-kind-e2e-kubernetes
pull-kind-conformance-parallel-1-16 48977ec link /test pull-kind-conformance-parallel-1-16

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@BenTheElder
Copy link
Member

let's revisit a more complete design in the 0.7ish timeframe with the provider API in mind and more attention to dns config handling across platforms etc.

@BenTheElder BenTheElder closed this Nov 1, 2019
@aojea
Copy link
Contributor

aojea commented Nov 9, 2019

let's revisit a more complete design in the 0.7ish timeframe with the provider API in mind and more attention to dns config handling across platforms etc.

do we want to use docker embedded dns for the nodes and use the autodiscovery or are we fine without that functionality?
The later will simplify a lot the implementations because will only requires to deal with the resolv.conf merging.
The downside is that it won't allow using the dns names to configure the cluster, but with all the ipv6 and dual-stack features coming in, I really would love not to depend on the resolver in those environments 😅

@tao12345666333
Copy link
Member Author

In my opinion, if we don't use docker embedded dns and only use the new bridge network, then we may be able to reduce a lot of trouble.😄

stg-0 pushed a commit to stg-0/kind that referenced this pull request Mar 7, 2024
…ubernetes-sigs#484)

* Update - create cluster - instllation dock EN/ES

* Update - create cluster - quick start guide dock EN/ES

* Fix - IP on doc

* Fix - IP on doc

* Fix - IP on doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants