-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Action/Falco] Investigate benchmark test data collection #50
Comments
Hi, I'd love to work on this one, however, I'd like to get some help to clarify the reqs before jumping into it.
If someone else has it more clear and wants to pair on this, I'm also open to it :) |
@raymundovr thank you for volunteering to contribute to this! 🥳
I think @AntonioDiTuri was referring to how we deployed some infrastructure-level components with Flux. Essentially, we can add a Kubernetes manifest in a directory watched by Flux and Flux will apply it in the cluster, like this ConfigMap: https://github.com/cncf-tags/green-reviews-tooling/blob/main/clusters/base/kepler-grafana.yaml As part of this issue, we would need to add a manifest for a Kubernetes CronJob in the Before jumping into it, I agree that we should refine the requirements a bit more. Shared this with Flaco maintainer @incertum and waiting for her feedback: falcosecurity/cncf-green-review-testing#13 |
idle
benchmark testidle
benchmark test
@raymundovr hi! 👋 I created an issue to bootstrap ARC (#58) but there are quite a few requirements and we have no guarantee that ARC will be up and running in time for us to go straight to the self-hosted runners solution by KubeCon. I propose that we get a head start with the bash script + CronJob as a temporary solution. That way we're not blocked by the authorization with PAT keys, secrets, etc etc. And then when ARC is ready, we can port the individual steps to a GitHub Actions workflow (it's relatively easy to split/convert bash scripts into GA workflow steps). What matters the most is that we implement the steps themselves one way or another so that we can gather sample metrics! To deploy the CronJob, we can add a CronJob manifest in Then we can the steps one by one :) What do you think? :) |
I believe the most challenging part will be the last step, which is to find a way to gather + store the metrics per test. Pushgateway is one option, which @rossf7 suggested (Slack context), but really we will understand more once we get to that step! |
Hi @nikimanoledaki Unfortunately I don't have a full overview of the cluster resources and its accessibility to give a more informed opinion on when ARC could be ready, therefore for me it's ok to start with the cron job, as suggested :) |
Hi @raymundovr, Just a heads up that I've started work on adding the self hosted runner in #63 I'll let you know once its deployed and tested. |
idle
benchmark test
I've been researching and playing with First, I'd like to mention that I think that the idea of creating any task / job to gather With this in consideration, I'd like to discuss the possibility to take the corresponding measurements from Prometheus (using PromQL / API queries) and present them in a consistent way, for example:
The measurements could be taken by a task running on a separate node, prompting Prometheus, as mentioned before, for the corresponding Pod or Namespace where Falco is running. Then it will output a table-like format which can be used as a way to calculate, for example, max, min, average and median. We can then decide where to store this output, or perhaps make it available as a service? We'll still need a way to trigger this measurements task, though. What do you think? |
Hi @raymundovr,
Yes, you can query Prometheus for the Kepler metrics and we should have just Falco running on its node. We've set node labels for this so you can add the selectors For microservices-demo it doesn't look like the helm chart lets you set a node selector :( they do also support kustomize and it might let us patch the deployments? https://github.com/GoogleCloudPlatform/microservices-demo/tree/main/helm-chart
Perfect, logging the results will let us validate the test steps and the measurements.
Yes, we still need to work on this part of the design. We could write the results to S3 for example but lets tackle this as a later step. |
Hi, Thank you @rossf7 for the suggestion. I'm not sure that the labels are exported as selectors into Prometheus. I have started to put something togeher, please check https://github.com/raymundovr/sustainability-metrics/blob/main/main.go In order to continue I'll probably need access to Prometheus, is there any chance to get that? I don't mind setting up any kind of tunnel if necessary. What do you think of this approach? |
Hi @raymundovr, Your code looks good! When you're ready we can move it to this repo. I think we could have the go module in the root and your code in For the github action we can use setup-go to install go and then run the binary. I don't think we need a container image yet. The action will also need a kubeconfig so we could use port forwarding for Prometheus or make the Prometheus API public?
Yes, you're right we can't use the k8s labels. I think using the There is an another factor which is the Falco team would like 3 deployments of Falco on different nodes as Falco has multiple drivers. See falcosecurity/cncf-green-review-testing#2 So far just a single node with the modern-ebpf driver is provisioned. We could use the |
Thanks @raymundovr and @rossf7 for moving this forward! The metrics that @raymundovr selected for the moment are:
@nikimanoledaki since you have some experience with the kepler metrics, do you think this is enough? For a first implementation I guess it is fine to print the metrics in the output, then we can refine it later :) |
Thank you @rossf7 and @AntonioDiTuri for taking a look and providing feedback.
What do you think @nikimanoledaki ? |
Update: cleaned things a bit and added |
The first benchmark test for Falco will be a baseline test.
We can start with a script that runs as a Cron Job in Kubernetes. In the future, we can automate this using self-hosted GitHub Action runners (see Proposal 1: Actions Runner Controller (ARC)).
Pre-requisite
clusters/projects/falco/cron.yaml
Benchmark Steps
Acceptance Criteria
The text was updated successfully, but these errors were encountered: