-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
attacher: try different kernel sources if there is none in default locations #733
attacher: try different kernel sources if there is none in default locations #733
Conversation
…cations Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Is this related to the PR #728? |
no, this is just to allow bcc to compile against pre-installed kernel source, while #728 is to install pre-compiled ebpf module |
@@ -138,9 +139,24 @@ func AttachBPFAssets() (*BpfModuleTables, error) { | |||
} | |||
// TODO: verify if ebpf can run in the VM without hardware counter support, if not, we can disable the HC part and only collect the cpu time | |||
m, err := loadModule(objProg, options) | |||
if err != nil { | |||
klog.Infof("failed to attach perf module with options %v: %v, from default kernel source.\n", options, err) | |||
dirs := config.GetKernelSourceDir() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: better rename it to SourceDirs if we allow multiple dirs ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the suggestion above and if you follow https://go.dev/doc/effective_go#Getters idiom, the recommended name would be config.KernelSourceDirs()
👼
pkg/config/config.go
Outdated
@@ -87,6 +87,9 @@ var ( | |||
|
|||
configPath = "/etc/kepler/kepler.config" | |||
|
|||
// dir of kernel sources for bcc | |||
kernelSourceDir = []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems it's a plural ?
@@ -0,0 +1,4 @@ | |||
FROM ImageName | |||
|
|||
ARG ARCH=amd64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SamYuan1990 seems we need add s390x me too
as well?
# pre install kernel sources | ||
RUN mkdir -p /usr/share/kepler/kernel_sources | ||
|
||
COPY --from=quay.io/sustainable_computing_io/kepler_kernel_source_images:ubi8 /usr/src/kernels /usr/share/kepler/kernel_sources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this be kept in the released image? how big it is for the copied files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each kernel-devel is about 5M
But If #728 cannot be done, do we still need this PR? |
Humm, as this PR is to change the kernel source dir, I understood that is not related to #728 |
@rootfs what is missing for testing? (as in the PR description) |
@marceloamaral I did a quick search, bcc appears to have CO-RE support in mind. There have been PRs towards that direction. Hopefully it will be on par with other libraries. In this PR, I have tested it with a standalone binary. Once this change can pass a kube env without kernel source, I'll share the results and ask you for another review. |
pkg/config/config.go
Outdated
// SetKernelSourceDir sets the directory for all kernel source. This is used for bcc. Only the top level directory is needed. | ||
func SetKernelSourceDir(dir string) { | ||
// read all the kernel source directories | ||
if dir != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the above proposed validation, we shouldn't need to check if dir != ""
test envhostMacBook % uname -a
Darwin xxx 22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:51:50 PDT 2023; root:xnu-8796.121.2~5/RELEASE_X86_64 x86_64 clusterkind 1.27 % kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-05-12T19:03:40Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"} without pre installed kernel% kubectl logs -n kepler daemonset/kepler-exporter
I0614 13:41:10.014533 1 gpu.go:46] Failed to init nvml, err: could not init nvml: error opening libnvidia-ml.so.1: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
I0614 13:41:10.034400 1 exporter.go:148] Kepler running on version: 221cb2a
I0614 13:41:10.034429 1 config.go:172] using gCgroup ID in the BPF program: true
I0614 13:41:10.034471 1 config.go:174] kernel version: 5.1
I0614 13:41:10.034488 1 exporter.go:161] EnabledBPFBatchDelete: true
I0614 13:41:10.035100 1 power.go:77] Not able to obtain power, use estimate method
I0614 13:41:10.652596 1 exporter.go:174] Initializing the GPU collector
modprobe: FATAL: Module kheaders not found in directory /lib/modules/5.10.25-linuxkit
chdir(/lib/modules/5.10.25-linuxkit/build): No such file or directory
I0614 13:41:10.671321 1 bcc_attacher.go:73] failed to attach the bpf program: <nil>
I0614 13:41:10.671356 1 bcc_attacher.go:142] failed to attach perf module with options [-DMAP_SIZE=10240 -DNUM_CPUS=8]: failed to attach the bpf program: <nil>, not able to load eBPF modules
I0614 13:41:10.671373 1 exporter.go:191] failed to start : failed to attach bpf assets: failed to attach the bpf program: <nil>
I0614 13:41:10.671514 1 exporter.go:218] Started Kepler in 637.129155ms with pre installed kernel% kubectl logs -n kepler daemonset/kepler-exporter -f
I0614 13:47:09.269344 1 gpu.go:46] Failed to init nvml, err: could not init nvml: error opening libnvidia-ml.so.1: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
I0614 13:47:09.276243 1 exporter.go:149] Kepler running on version: v0.5-46-g8a3cfa3-dirty
I0614 13:47:09.276349 1 config.go:197] using gCgroup ID in the BPF program: true
I0614 13:47:09.276381 1 config.go:199] kernel version: 5.1
I0614 13:47:09.276401 1 config.go:159] kernel source dir is set to /usr/share/kepler/kernel_sources
I0614 13:47:09.276495 1 exporter.go:163] EnabledBPFBatchDelete: true
I0614 13:47:09.277574 1 power.go:77] Not able to obtain power, use estimate method
I0614 13:47:09.734056 1 exporter.go:176] Initializing the GPU collector
modprobe: FATAL: Module kheaders not found in directory /lib/modules/5.10.25-linuxkit
chdir(/lib/modules/5.10.25-linuxkit/build): No such file or directory
I0614 13:47:09.740528 1 bcc_attacher.go:74] failed to attach the bpf program: <nil>
I0614 13:47:09.740739 1 bcc_attacher.go:143] failed to attach perf module with options [-DMAP_SIZE=10240 -DNUM_CPUS=8]: failed to attach the bpf program: <nil>, from default kernel source.
I0614 13:47:09.740886 1 bcc_attacher.go:146] try to load eBPF module with kernel source dir /usr/share/kepler/kernel_sources/4.18.0-477.13.1.el8_8.x86_64
perf_event_open: No such file or directory
I0614 13:47:10.569284 1 bcc_attacher.go:108] failed to attach perf event cpu_cycles_hc_reader: failed to open bpf perf event: no such file or directory
perf_event_open: No such file or directory
I0614 13:47:10.569474 1 bcc_attacher.go:108] failed to attach perf event cpu_ref_cycles_hc_reader: failed to open bpf perf event: no such file or directory
perf_event_open: No such file or directory
I0614 13:47:10.569801 1 bcc_attacher.go:108] failed to attach perf event cpu_instr_hc_reader: failed to open bpf perf event: no such file or directory
perf_event_open: No such file or directory
I0614 13:47:10.570051 1 bcc_attacher.go:108] failed to attach perf event cache_miss_hc_reader: failed to open bpf perf event: no such file or directory
I0614 13:47:10.570137 1 bcc_attacher.go:152] Successfully load eBPF module with option: [-DMAP_SIZE=10240 -DNUM_CPUS=8] from kernel source "/usr/share/kepler/kernel_sources/4.18.0-477.13.1.el8_8.x86_64"
I0614 13:47:10.570222 1 bcc_attacher.go:171] Successfully load eBPF module with option: [-DMAP_SIZE=10240 -DNUM_CPUS=8] kepler daemonset% kubectl describe -n kepler daemonset kepler-exporter
Name: kepler-exporter
Selector: app.kubernetes.io/component=exporter,app.kubernetes.io/name=kepler-exporter,sustainable-computing.io/app=kepler
Node-Selector: <none>
Labels: sustainable-computing.io/app=kepler
Annotations: deprecated.daemonset.template.generation: 2
Desired Number of Nodes Scheduled: 1
Current Number of Nodes Scheduled: 1
Number of Nodes Scheduled with Up-to-date Pods: 1
Number of Nodes Scheduled with Available Pods: 1
Number of Nodes Misscheduled: 0
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app.kubernetes.io/component=exporter
app.kubernetes.io/name=kepler-exporter
sustainable-computing.io/app=kepler
Service Account: kepler-sa
Containers:
kepler-exporter:
Image: quay.io/sustainable_computing_io/kepler:pr733
Port: 9102/TCP
Host Port: 0/TCP
Command:
/bin/sh
-c
Args:
/usr/bin/kepler -v=1 --kernel-source-dir=/usr/share/kepler/kernel_sources
Requests:
cpu: 100m
memory: 400Mi
Liveness: http-get http://:9102/healthz delay=10s timeout=10s period=60s #success=1 #failure=5
Environment:
NODE_IP: (v1:status.hostIP)
Mounts:
/etc/kepler/kepler.config from cfm (ro)
/lib/modules from lib-modules (rw)
/proc from proc (rw)
/sys from tracing (rw)
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType: Directory
tracing:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType: Directory
proc:
Type: HostPath (bare host directory volume)
Path: /proc
HostPathType: Directory
cfm:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kepler-cfm
Optional: false kepler output % kubectl exec -ti -n kepler daemonset/kepler-exporter -- bash -c "curl localhost:9102/metrics" |grep kepler_container_joules |sort -k 2 -g |tail -10
kepler_container_joules_total{command="",container_id="0a5c450bfd02400d13d982d1feb922fc9f5b2a5f5e29508a42d94095b070029c",container_name="kube-apiserver",container_namespace="kube-system",mode="dynamic",pod_name="kube-apiserver-kind-control-plane"} 328.50800000000004
kepler_container_joules_total{command="",container_id="3ce3263a78f5667edd6013d3b43df12045456c5f3339665fc1039462a3d97434",container_name="kindnet-cni",container_namespace="kube-system",mode="dynamic",pod_name="kindnet-wtfng"} 328.50800000000004
kepler_container_joules_total{command="",container_id="6cbcd04f84d38a5d50e705520a1ebdc4c81441a0d94b3a5224bc4e3ae78a4864",container_name="kube-controller-manager",container_namespace="kube-system",mode="dynamic",pod_name="kube-controller-manager-kind-control-plane"} 328.50800000000004
kepler_container_joules_total{command="",container_id="7ab73310fa1535799f5ddc957bc695c6745a28e96780909972eb7ea0229ff16e",container_name="coredns",container_namespace="kube-system",mode="dynamic",pod_name="coredns-5d78c9869d-9qpnk"} 328.50800000000004
kepler_container_joules_total{command="",container_id="b2ea4fdb1034bb545cd4cc687ae43e340f001bd820996bc59aaf68a9b4a52153",container_name="kepler-exporter",container_namespace="kepler",mode="dynamic",pod_name="kepler-exporter-7xz58"} 328.50800000000004
kepler_container_joules_total{command="",container_id="system_processes",container_name="system_processes",container_namespace="system",mode="dynamic",pod_name="system_processes"} 328.50800000000004
kepler_container_joules_total{command="containerd",container_id="e7e940e1a3879022670295a836dff77af3969e4727a9460b396453ded7ac2b5b",container_name="kube-proxy",container_namespace="kube-system",mode="dynamic",pod_name="kube-proxy-86l4x"} 328.53200000000004
kepler_container_joules_total{command="vpnkit-for",container_id="344aa4e5dfff04b08934f603dae9dbed879e6c0dedde895d0e2453e8f001662b",container_name="local-path-provisioner",container_namespace="local-path-storage",mode="dynamic",pod_name="local-path-provisioner-6bc4bddd6b-kkhnv"} 328.579
kepler_container_joules_total{command="containerd",container_id="b820bc8299ef22538019d5e7959531913cfade2c7989643dfdd066f4f1a75bd5",container_name="coredns",container_namespace="kube-system",mode="dynamic",pod_name="coredns-5d78c9869d-qwpvr"} 328.694
kepler_container_joules_total{command="jbd2/vda1-",container_id="1a16001c73016dd6b6f7f313f882df65e7f6791237dd7b457e6d0caff3a378d8",container_name="kube-scheduler",container_namespace="kube-system",mode="dynamic",pod_name="kube-scheduler-kind-control-plane"} 328.98400000000004 |
I ran the test on my macbook, the pre installed kernel option passed the test, i can see kepler output from there. |
Co-authored-by: Sunil Thaha <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: Sunil Thaha <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: Sunil Thaha <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
@SamYuan1990 can you add this to the section in your task to document new OS support? |
@husky-parul can you add this option to the operator after merge? thanks |
Signed-off-by: Huamin Chen <[email protected]>
tested and review feedback addressed, PTAL, thanks |
Signed-off-by: Huamin Chen <[email protected]>
cc @yellowhat for helm update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
#716