Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ipvs: remove module check #114669

Merged
merged 11 commits into from
Dec 26, 2022
Merged

Conversation

uablrek
Copy link
Contributor

@uablrek uablrek commented Dec 22, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

The check of kernel modules in proxy/ipvs prevents usage with statically linked kernels. Instead the function should be checked.

Which issue(s) this PR fixes:

Fixes #108579

Special notes for your reviewer:

Please read #108579 (comment).

Comments on individual commits below.

Does this PR introduce a user-facing change?

kube-proxy with proxy-mode=ipvs can be used with statically linked kernels.
The reseved IPv4 range TEST-NET-2 in rfc5737 MUST NOT be used for ClusterIP or loadBalancerIP since address 198.51.100.0 is used for probing.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

N/A

Lars Ekman added 2 commits December 22, 2022 17:13
To check kernel modules is a bad way to check functionality.
This commit just removes the checks and makes it possible to
use a statically linked kernel.

Minimal updates to unit-tests are made.
This tests both ipvs and the configured scheduler
@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 22, 2022
@k8s-ci-robot
Copy link
Contributor

@uablrek: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Dec 22, 2022
@k8s-ci-robot k8s-ci-robot added area/ipvs sig/network Categorizes an issue or PR as relevant to SIG Network. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 22, 2022
@uablrek
Copy link
Contributor Author

uablrek commented Dec 22, 2022

Commit - proxy/ipvs: Remove kernel module tests

Just removes the module checks

@uablrek
Copy link
Contributor Author

uablrek commented Dec 22, 2022

proxy/ipvs: Check that a dummy virtual server can be added

The logic should have been straight forward if not for the BUG:

	// BUG: If ipvs is not compiled into the kernel ipvs.GetVirtualServers
	// does not return an error. Instead also ipvs.AddVirtualServer returns OK
	// (no error)!

As a work-around I re-do ipvs.GetVirtualServers after adding the dummy server and expect a len > 0. I will investigate this bug further. I hope it is in K8s, but it can be in "github.com/moby/ipvs".

I build my own kernel so I can test this. I have built one kernel without ipvs entirely, and one with ipvs but without the lc scheduler. Below are some log examples.

Succesful start:

I1222 20:09:40.117013     400 proxier.go:735] "Virtual Servers" count=0
I1222 20:09:40.119420     400 proxier.go:765] "Virtual Servers after adding dummy" count=1
I1222 20:09:40.119451     400 proxier.go:771] "Dummy VS created" vs={Address:10.0.0.1 Protocol:TCP Port:20000 Scheduler:rr Flags:0 Timeout:0}
I1222 20:09:40.121132     400 server_others.go:248] "Using ipvs Proxier"

With ipvs, but without lc and lc configured as sheduler:

I1222 20:11:58.764112     391 proxier.go:735] "Virtual Servers" count=0
E1222 20:11:58.765903     391 proxier.go:755] "Could not create dummy VS" err="no such file or directory" scheduler="lc"
E1222 20:11:58.765930     391 server.go:492] "Error running ProxyServer" err="can't use the IPVS proxier: no such file or directory"

Kernel without ipvs:

I1222 20:13:44.631763     399 proxier.go:735] "Virtual Servers" count=0
I1222 20:13:44.631834     399 proxier.go:765] "Virtual Servers after adding dummy" count=0
I1222 20:13:44.631847     399 proxier.go:767] "Dummy VS not created" scheduler="rr"
E1222 20:13:44.631868     399 server.go:492] "Error running ProxyServer" err="can't use the IPVS proxier: Ipvs not supported"

This shows the bug. No error is returned from GetVirtualServers, nor AddVirtualServer even though there is no ipvs at all.

@uablrek
Copy link
Contributor Author

uablrek commented Dec 22, 2022

/assign @thockin

Lars Ekman added 2 commits December 23, 2022 10:11
To update the scheduler without node reboot now works.
The address for the probe VS is now 198.51.100.0 which is
reseved for documentation, please see rfc5737. The comment
about this is extended.
@uablrek uablrek marked this pull request as ready for review December 23, 2022 09:15
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 23, 2022
@k8s-ci-robot k8s-ci-robot requested a review from bowei December 23, 2022 09:16
@uablrek
Copy link
Contributor Author

uablrek commented Dec 23, 2022

Updates after rewiew

Address "198.51.100.0" is now used for the probe VS. It is reserved for documentation only by rfc5737. To use that address IRL is an invalid configuration. The release-note is updated in this PR. The comment is extended:

	// We should use a VIP address that is not used on the node.
	// An address "198.51.100.0" from the TEST-NET-2 rage in https://datatracker.ietf.org/doc/html/rfc5737
	// is used. These addresses are reserved for documentation. If the user is using
	// this address for a VS anyway we *will* mess up, but that would be an invalid configuration.
	// If the user have configured the address to an interface on the node (but not a VS)
	// then traffic will temporary be routed to ipvs during the probe and dropped.
	// The later case is also and invalid configuration, but the traffic impact will be minor.
	// This should not be a problem if users honors reserved addresses, but cut/paste
	// from documentation is not unheard of, so the restricion to not use the TEST-NET-2 range
	// must be documented.

I have tested to re-configure the scheduler without node-reboot and it works fine.

@uablrek
Copy link
Contributor Author

uablrek commented Dec 23, 2022

About automatic kernel module loading

The kernel request a module by an up-call to:

> cat /proc/sys/kernel/modprobe
/sbin/modprobe

This is used for instance by iptables to load xt_* modules on demand, like xt_MASQUERADE.

It is however possible to break this and enforce manual module loading. Example:

> echo /bin/false > /proc/sys/kernel/modprobe
> ipvsadm -A -s mh -t 10.2.2.2:8080
Scheduler or persistence engine not found
> modprobe ip_vs_mh
> ipvsadm -A -s mh -t 10.2.2.2:8080
# (now works)

Since so many other things in K8s requires the automatic module loading to work I don't think some fall-back is necessary for the ipvs proxier. Actually I think it will not work without it in the current implementation.

@uablrek
Copy link
Contributor Author

uablrek commented Dec 23, 2022

Slack thread in sig-docs on the test address https://kubernetes.slack.com/archives/C1J0BPD2M/p1671793688732989

@uablrek
Copy link
Contributor Author

uablrek commented Dec 23, 2022

Unfortunately the ipvs detect bug was in moby: moby/ipvs#27

@uablrek
Copy link
Contributor Author

uablrek commented Dec 24, 2022

I added this code for test;

	klog.V(5).InfoS("Virtual Servers after adding dummy", "count", len(vservers))
	if len(vservers) == 0 {
		klog.InfoS("Dummy VS not created", "scheduler", scheduler)
		return fmt.Errorf("Ipvs not supported") // This is a BUG work-around
	}
	klog.V(5).InfoS("Dummy VS created", "vs", vs)

	if scheduler == "wlc" {
		klog.Fatalln("Terminate for test reasons")
	}

	if err := ipvs.DeleteVirtualServer(&vs); err != nil {
		klog.ErrorS(err, "Could not delete dummy VS")
		return err
	}

I started with the default scheduler "rr" and switched to "wlc", the expected sequence is:

  1. Start normally, alter to scheduler=wlc which trigger a kube-proxy re-start
  2. Create a dummy VS for scheduler=wlc
  3. Hit the Fatalf() and leave the dummy VS un-deleted
  4. On re-start a VS with scheduler=wlc already exist, i.e. the lingering dummy VS
  5. Take the happy path and return without any further tests, leaving the dummy VS as-is
  6. Kube-proxy start continues with scheduler=wlc

The question is: will the lingering dummy VS be deleted?

And the answer is: yes, it will.

This is a fairly slow process so the interim-state between re-starts can be captured;

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  198.51.100.0:20000 wlc
TCP  12.0.0.1:443 rr
  -> 192.168.1.1:6443             Masq    1      0          0         

Here the old kubernetes svc is still using "rr", and the dummy VS can be spotted. But after the second re-start it is corrected:

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  12.0.0.1:443 wlc
  -> 192.168.1.1:6443             Masq    1      0          0         

The kubernetes svc now uses "wlc", and the dummy VS is deleted.

Handle moby/ipvs#27
A work-around was already in place, but a segv would occur
when the bug is fixed. That will not happen now.
@thockin
Copy link
Member

thockin commented Dec 25, 2022

I believe you that it is deleted but I can't see WHY it will be.

After the log-fatal, it restarts. It hits ipvs.GetVirtualServers() which returns non-nil (I presume) and len(vservers) > 0, right? It would find the dummy with wlc and return - so who is deleting it?

@uablrek
Copy link
Contributor Author

uablrek commented Dec 26, 2022

The dummy VS is deleted in the cleanLegacyService() function;

func (proxier *Proxier) cleanLegacyService(activeServices map[string]bool, currentServices map[string]*utilipvs.VirtualServer, legacyBindAddrs map[string]bool) {

This function is called from the syncProxyRules() function which is the main sync function. It can be seen in the logs from the second re-start in the example above:

I1226 08:45:14.289822    1315 iptables.go:423] "Running" command="iptables-restore" arguments=[-w 5 -W 100000 --noflush --counters]
I1226 08:45:14.292385    1315 proxier.go:2099] "Delete service" virtualServer="198.51.100.0:20000/TCP"
I1226 08:45:14.292433    1315 proxier.go:1002] "syncProxyRules complete" elapsed="56.913032ms"
I1226 08:45:14.292448    1315 proxier.go:1026] "Syncing ipvs proxier rules"
...

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 26, 2022
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 1a736a56740683422aef62bb4bf1dd483624ef08

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thockin, uablrek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 26, 2022
@k8s-ci-robot k8s-ci-robot merged commit 68b9657 into kubernetes:master Dec 26, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Dec 26, 2022
@uablrek uablrek deleted the ipvs-remove-module-check branch January 6, 2023 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ipvs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The code proxier.go failed to check kernel module when the folder /proc/module is not exsit
3 participants