Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attacher: try different kernel sources if there is none in default locations #733

Merged
merged 7 commits into from
Jun 15, 2023

Conversation

rootfs
Copy link
Contributor

@rootfs rootfs commented Jun 12, 2023

  • kepler can try different kernel source locations
  • add kernel sources to kepler container
  • test a standalone kepler with the new option and different kernel source dir
  • test kepler image on a k8s env that has no kernel devel
  • add doc how to add kernel sources and support custom environment (assigned to @SamYuan1990)

#716

@rootfs rootfs changed the title [WIP] attacher: try different kernel sources if there is none in default locations attacher: try different kernel sources if there is none in default locations Jun 12, 2023
@marceloamaral
Copy link
Collaborator

Is this related to the PR #728?

@rootfs
Copy link
Contributor Author

rootfs commented Jun 13, 2023

no, this is just to allow bcc to compile against pre-installed kernel source, while #728 is to install pre-compiled ebpf module

@@ -138,9 +139,24 @@ func AttachBPFAssets() (*BpfModuleTables, error) {
}
// TODO: verify if ebpf can run in the VM without hardware counter support, if not, we can disable the HC part and only collect the cpu time
m, err := loadModule(objProg, options)
if err != nil {
klog.Infof("failed to attach perf module with options %v: %v, from default kernel source.\n", options, err)
dirs := config.GetKernelSourceDir()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: better rename it to SourceDirs if we allow multiple dirs ..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the suggestion above and if you follow https://go.dev/doc/effective_go#Getters idiom, the recommended name would be config.KernelSourceDirs() 👼

@@ -87,6 +87,9 @@ var (

configPath = "/etc/kepler/kepler.config"

// dir of kernel sources for bcc
kernelSourceDir = []string{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems it's a plural ?

@@ -0,0 +1,4 @@
FROM ImageName

ARG ARCH=amd64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SamYuan1990 seems we need add s390x me too as well?

# pre install kernel sources
RUN mkdir -p /usr/share/kepler/kernel_sources

COPY --from=quay.io/sustainable_computing_io/kepler_kernel_source_images:ubi8 /usr/src/kernels /usr/share/kepler/kernel_sources
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this be kept in the released image? how big it is for the copied files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each kernel-devel is about 5M

@marceloamaral
Copy link
Collaborator

But If #728 cannot be done, do we still need this PR?

@marceloamaral
Copy link
Collaborator

Humm, as this PR is to change the kernel source dir, I understood that is not related to #728

@marceloamaral
Copy link
Collaborator

@rootfs what is missing for testing? (as in the PR description)

@rootfs
Copy link
Contributor Author

rootfs commented Jun 13, 2023

@marceloamaral I did a quick search, bcc appears to have CO-RE support in mind. There have been PRs towards that direction. Hopefully it will be on par with other libraries.

In this PR, I have tested it with a standalone binary. Once this change can pass a kube env without kernel source, I'll share the results and ask you for another review.

pkg/config/config.go Outdated Show resolved Hide resolved
// SetKernelSourceDir sets the directory for all kernel source. This is used for bcc. Only the top level directory is needed.
func SetKernelSourceDir(dir string) {
// read all the kernel source directories
if dir != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the above proposed validation, we shouldn't need to check if dir != ""

@rootfs
Copy link
Contributor Author

rootfs commented Jun 14, 2023

test env

host

MacBook

% uname -a
Darwin xxx 22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:51:50 PDT 2023; root:xnu-8796.121.2~5/RELEASE_X86_64 x86_64

cluster

kind 1.27

 % kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:18:45Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-05-12T19:03:40Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}

without pre installed kernel

% kubectl logs -n kepler daemonset/kepler-exporter
I0614 13:41:10.014533       1 gpu.go:46] Failed to init nvml, err: could not init nvml: error opening libnvidia-ml.so.1: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
I0614 13:41:10.034400       1 exporter.go:148] Kepler running on version: 221cb2a
I0614 13:41:10.034429       1 config.go:172] using gCgroup ID in the BPF program: true
I0614 13:41:10.034471       1 config.go:174] kernel version: 5.1
I0614 13:41:10.034488       1 exporter.go:161] EnabledBPFBatchDelete: true
I0614 13:41:10.035100       1 power.go:77] Not able to obtain power, use estimate method
I0614 13:41:10.652596       1 exporter.go:174] Initializing the GPU collector
modprobe: FATAL: Module kheaders not found in directory /lib/modules/5.10.25-linuxkit
chdir(/lib/modules/5.10.25-linuxkit/build): No such file or directory
I0614 13:41:10.671321       1 bcc_attacher.go:73] failed to attach the bpf program: <nil>
I0614 13:41:10.671356       1 bcc_attacher.go:142] failed to attach perf module with options [-DMAP_SIZE=10240 -DNUM_CPUS=8]: failed to attach the bpf program: <nil>, not able to load eBPF modules
I0614 13:41:10.671373       1 exporter.go:191] failed to start : failed to attach bpf assets: failed to attach the bpf program: <nil>
I0614 13:41:10.671514       1 exporter.go:218] Started Kepler in 637.129155ms

with pre installed kernel

% kubectl logs -n kepler daemonset/kepler-exporter -f
I0614 13:47:09.269344       1 gpu.go:46] Failed to init nvml, err: could not init nvml: error opening libnvidia-ml.so.1: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
I0614 13:47:09.276243       1 exporter.go:149] Kepler running on version: v0.5-46-g8a3cfa3-dirty
I0614 13:47:09.276349       1 config.go:197] using gCgroup ID in the BPF program: true
I0614 13:47:09.276381       1 config.go:199] kernel version: 5.1
I0614 13:47:09.276401       1 config.go:159] kernel source dir is set to /usr/share/kepler/kernel_sources
I0614 13:47:09.276495       1 exporter.go:163] EnabledBPFBatchDelete: true
I0614 13:47:09.277574       1 power.go:77] Not able to obtain power, use estimate method
I0614 13:47:09.734056       1 exporter.go:176] Initializing the GPU collector
modprobe: FATAL: Module kheaders not found in directory /lib/modules/5.10.25-linuxkit
chdir(/lib/modules/5.10.25-linuxkit/build): No such file or directory
I0614 13:47:09.740528       1 bcc_attacher.go:74] failed to attach the bpf program: <nil>
I0614 13:47:09.740739       1 bcc_attacher.go:143] failed to attach perf module with options [-DMAP_SIZE=10240 -DNUM_CPUS=8]: failed to attach the bpf program: <nil>, from default kernel source.
I0614 13:47:09.740886       1 bcc_attacher.go:146] try to load eBPF module with kernel source dir /usr/share/kepler/kernel_sources/4.18.0-477.13.1.el8_8.x86_64
perf_event_open: No such file or directory
I0614 13:47:10.569284       1 bcc_attacher.go:108] failed to attach perf event cpu_cycles_hc_reader: failed to open bpf perf event: no such file or directory
perf_event_open: No such file or directory
I0614 13:47:10.569474       1 bcc_attacher.go:108] failed to attach perf event cpu_ref_cycles_hc_reader: failed to open bpf perf event: no such file or directory
perf_event_open: No such file or directory
I0614 13:47:10.569801       1 bcc_attacher.go:108] failed to attach perf event cpu_instr_hc_reader: failed to open bpf perf event: no such file or directory
perf_event_open: No such file or directory
I0614 13:47:10.570051       1 bcc_attacher.go:108] failed to attach perf event cache_miss_hc_reader: failed to open bpf perf event: no such file or directory
I0614 13:47:10.570137       1 bcc_attacher.go:152] Successfully load eBPF module with option: [-DMAP_SIZE=10240 -DNUM_CPUS=8] from kernel source "/usr/share/kepler/kernel_sources/4.18.0-477.13.1.el8_8.x86_64"
I0614 13:47:10.570222       1 bcc_attacher.go:171] Successfully load eBPF module with option: [-DMAP_SIZE=10240 -DNUM_CPUS=8]

kepler daemonset

% kubectl describe -n kepler daemonset kepler-exporter
Name:           kepler-exporter
Selector:       app.kubernetes.io/component=exporter,app.kubernetes.io/name=kepler-exporter,sustainable-computing.io/app=kepler
Node-Selector:  <none>
Labels:         sustainable-computing.io/app=kepler
Annotations:    deprecated.daemonset.template.generation: 2
Desired Number of Nodes Scheduled: 1
Current Number of Nodes Scheduled: 1
Number of Nodes Scheduled with Up-to-date Pods: 1
Number of Nodes Scheduled with Available Pods: 1
Number of Nodes Misscheduled: 0
Pods Status:  1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/component=exporter
                    app.kubernetes.io/name=kepler-exporter
                    sustainable-computing.io/app=kepler
  Service Account:  kepler-sa
  Containers:
   kepler-exporter:
    Image:      quay.io/sustainable_computing_io/kepler:pr733
    Port:       9102/TCP
    Host Port:  0/TCP
    Command:
      /bin/sh
      -c
    Args:
      /usr/bin/kepler -v=1 --kernel-source-dir=/usr/share/kepler/kernel_sources
    Requests:
      cpu:     100m
      memory:  400Mi
    Liveness:  http-get http://:9102/healthz delay=10s timeout=10s period=60s #success=1 #failure=5
    Environment:
      NODE_IP:   (v1:status.hostIP)
    Mounts:
      /etc/kepler/kepler.config from cfm (ro)
      /lib/modules from lib-modules (rw)
      /proc from proc (rw)
      /sys from tracing (rw)
  Volumes:
   lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  Directory
   tracing:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:  Directory
   proc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:  Directory
   cfm:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kepler-cfm
    Optional:  false

kepler output

 % kubectl exec -ti -n kepler daemonset/kepler-exporter -- bash -c "curl localhost:9102/metrics" |grep kepler_container_joules |sort -k 2 -g |tail -10
kepler_container_joules_total{command="",container_id="0a5c450bfd02400d13d982d1feb922fc9f5b2a5f5e29508a42d94095b070029c",container_name="kube-apiserver",container_namespace="kube-system",mode="dynamic",pod_name="kube-apiserver-kind-control-plane"} 328.50800000000004
kepler_container_joules_total{command="",container_id="3ce3263a78f5667edd6013d3b43df12045456c5f3339665fc1039462a3d97434",container_name="kindnet-cni",container_namespace="kube-system",mode="dynamic",pod_name="kindnet-wtfng"} 328.50800000000004
kepler_container_joules_total{command="",container_id="6cbcd04f84d38a5d50e705520a1ebdc4c81441a0d94b3a5224bc4e3ae78a4864",container_name="kube-controller-manager",container_namespace="kube-system",mode="dynamic",pod_name="kube-controller-manager-kind-control-plane"} 328.50800000000004
kepler_container_joules_total{command="",container_id="7ab73310fa1535799f5ddc957bc695c6745a28e96780909972eb7ea0229ff16e",container_name="coredns",container_namespace="kube-system",mode="dynamic",pod_name="coredns-5d78c9869d-9qpnk"} 328.50800000000004
kepler_container_joules_total{command="",container_id="b2ea4fdb1034bb545cd4cc687ae43e340f001bd820996bc59aaf68a9b4a52153",container_name="kepler-exporter",container_namespace="kepler",mode="dynamic",pod_name="kepler-exporter-7xz58"} 328.50800000000004
kepler_container_joules_total{command="",container_id="system_processes",container_name="system_processes",container_namespace="system",mode="dynamic",pod_name="system_processes"} 328.50800000000004
kepler_container_joules_total{command="containerd",container_id="e7e940e1a3879022670295a836dff77af3969e4727a9460b396453ded7ac2b5b",container_name="kube-proxy",container_namespace="kube-system",mode="dynamic",pod_name="kube-proxy-86l4x"} 328.53200000000004
kepler_container_joules_total{command="vpnkit-for",container_id="344aa4e5dfff04b08934f603dae9dbed879e6c0dedde895d0e2453e8f001662b",container_name="local-path-provisioner",container_namespace="local-path-storage",mode="dynamic",pod_name="local-path-provisioner-6bc4bddd6b-kkhnv"} 328.579
kepler_container_joules_total{command="containerd",container_id="b820bc8299ef22538019d5e7959531913cfade2c7989643dfdd066f4f1a75bd5",container_name="coredns",container_namespace="kube-system",mode="dynamic",pod_name="coredns-5d78c9869d-qwpvr"} 328.694
kepler_container_joules_total{command="jbd2/vda1-",container_id="1a16001c73016dd6b6f7f313f882df65e7f6791237dd7b457e6d0caff3a378d8",container_name="kube-scheduler",container_namespace="kube-system",mode="dynamic",pod_name="kube-scheduler-kind-control-plane"} 328.98400000000004

@rootfs
Copy link
Contributor Author

rootfs commented Jun 14, 2023

I ran the test on my macbook, the pre installed kernel option passed the test, i can see kepler output from there.

rootfs and others added 3 commits June 14, 2023 09:57
Co-authored-by: Sunil Thaha <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: Sunil Thaha <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: Sunil Thaha <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>
@rootfs
Copy link
Contributor Author

rootfs commented Jun 14, 2023

@SamYuan1990 can you add this to the section in your task to document new OS support?

@rootfs
Copy link
Contributor Author

rootfs commented Jun 14, 2023

@husky-parul can you add this option to the operator after merge? thanks

Signed-off-by: Huamin Chen <[email protected]>
@rootfs
Copy link
Contributor Author

rootfs commented Jun 14, 2023

tested and review feedback addressed, PTAL, thanks

Signed-off-by: Huamin Chen <[email protected]>
@rootfs
Copy link
Contributor Author

rootfs commented Jun 14, 2023

cc @yellowhat for helm update

Copy link
Collaborator

@husky-parul husky-parul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@husky-parul husky-parul merged commit 5d8662f into sustainable-computing-io:main Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants