🌱 add benchmark pipeline #1651

jianzhangbjz · 2025-01-27T06:21:41Z

Description

See more: #920

Serial run on AWS, Cluster version is 4.18.0-0.nightly-2025-01-25-163410

jiazha-mac:e2e jiazha$ go test -run=^$ -bench=. -count=10 -memprofile=mem.out -cpuprofile=cpu.out
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
BenchmarkCreateClusterCatalog-10    	       1	1444716583 ns/op
BenchmarkCreateClusterCatalog-10    	       2	 619736229 ns/op
BenchmarkCreateClusterCatalog-10    	       2	 594375416 ns/op
BenchmarkCreateClusterCatalog-10    	       2	 599388104 ns/op
BenchmarkCreateClusterCatalog-10    	       2	 578713375 ns/op
BenchmarkCreateClusterCatalog-10    	       2	 604820354 ns/op
BenchmarkCreateClusterCatalog-10    	       2	 614665062 ns/op
BenchmarkCreateClusterCatalog-10    	       2	 613025938 ns/op
BenchmarkCreateClusterCatalog-10    	       2	 622365104 ns/op
BenchmarkCreateClusterCatalog-10    	       2	 591780896 ns/op
PASS
ok  	github.com/operator-framework/operator-controller/test/e2e	18.360s
jiazha-mac:e2e jiazha$ go tool pprof mem.out 
File: e2e.test
Type: alloc_space
Time: Jan 27, 2025 at 2:00pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 6893.09kB, 72.92% of 9453.29kB total
Showing top 10 nodes out of 93
      flat  flat%   sum%        cum   cum%
 1762.94kB 18.65% 18.65%  1762.94kB 18.65%  runtime/pprof.StartCPUProfile
  902.59kB  9.55% 28.20%  1485.59kB 15.72%  compress/flate.NewWriter
  583.01kB  6.17% 34.36%   583.01kB  6.17%  compress/flate.newDeflateFast (inline)
  548.84kB  5.81% 40.17%  1573.29kB 16.64%  k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypeWithName
  532.26kB  5.63% 45.80%   532.26kB  5.63%  github.com/gogo/protobuf/proto.RegisterType
  513.50kB  5.43% 51.23%   513.50kB  5.43%  k8s.io/apimachinery/pkg/conversion.ConversionFuncs.AddUntyped
  512.75kB  5.42% 56.66%   512.75kB  5.42%  vendor/golang.org/x/crypto/cryptobyte.(*Builder).add
  512.62kB  5.42% 62.08%   512.62kB  5.42%  k8s.io/api/apps/v1beta2.addKnownTypes
  512.44kB  5.42% 67.50%   512.44kB  5.42%  sync.(*Map).dirtyLocked
  512.14kB  5.42% 72.92%   512.14kB  5.42%  k8s.io/api/resource/v1alpha3.init
(pprof) exit
jiazha-mac:e2e jiazha$ go tool pprof cpu.out 
File: e2e.test
Type: cpu
Time: Jan 27, 2025 at 2:00pm (CST)
Duration: 17.83s, Total samples = 160ms (  0.9%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 160ms, 100% of 160ms total
Showing top 10 nodes out of 93
      flat  flat%   sum%        cum   cum%
      40ms 25.00% 25.00%       40ms 25.00%  runtime.pthread_cond_signal
      40ms 25.00% 50.00%       40ms 25.00%  runtime.scanobject
      40ms 25.00% 75.00%       40ms 25.00%  syscall.syscall
      10ms  6.25% 81.25%       10ms  6.25%  crypto/internal/edwards25519/field.addMul64 (inline)
      10ms  6.25% 87.50%       10ms  6.25%  k8s.io/apimachinery/pkg/runtime.(*clientNegotiator).Decoder
      10ms  6.25% 93.75%       10ms  6.25%  runtime.pthread_kill
      10ms  6.25%   100%       10ms  6.25%  runtime.usleep
         0     0%   100%       30ms 18.75%  bufio.(*Writer).Flush
         0     0%   100%       10ms  6.25%  crypto/ecdh.(*PrivateKey).PublicKey
         0     0%   100%       10ms  6.25%  crypto/ecdh.(*PrivateKey).PublicKey.func1
(pprof) exit

parallel run on AWS, Cluster version is 4.18.0-0.nightly-2025-01-25-163410

jiazha-mac:e2e jiazha$ go test -run=^$ -bench=. -count=10 -memprofile=mem.out -cpuprofile=cpu.out
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
BenchmarkCreateClusterCatalog-10    	       1	1559796167 ns/op
BenchmarkCreateClusterCatalog-10    	      12	 105038868 ns/op
BenchmarkCreateClusterCatalog-10    	      13	  88542141 ns/op
BenchmarkCreateClusterCatalog-10    	      13	  94152035 ns/op
BenchmarkCreateClusterCatalog-10    	      13	  93457205 ns/op
BenchmarkCreateClusterCatalog-10    	      13	  94673955 ns/op
BenchmarkCreateClusterCatalog-10    	      20	  62803019 ns/op
BenchmarkCreateClusterCatalog-10    	      13	  87578115 ns/op
BenchmarkCreateClusterCatalog-10    	      12	 107728125 ns/op
BenchmarkCreateClusterCatalog-10    	      12	  98580924 ns/op
PASS
ok  	github.com/operator-framework/operator-controller/test/e2e	38.984s
jiazha-mac:e2e jiazha$ go tool pprof cpu.out 
File: e2e.test
Type: cpu
Time: Jan 27, 2025 at 2:09pm (CST)
Duration: 38.06s, Total samples = 570ms ( 1.50%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 410ms, 71.93% of 570ms total
Showing top 10 nodes out of 201
      flat  flat%   sum%        cum   cum%
      70ms 12.28% 12.28%       70ms 12.28%  runtime.kevent
      70ms 12.28% 24.56%       70ms 12.28%  runtime.pthread_cond_signal
      60ms 10.53% 35.09%       60ms 10.53%  runtime.pthread_cond_wait
      60ms 10.53% 45.61%       60ms 10.53%  syscall.syscall
      50ms  8.77% 54.39%       50ms  8.77%  runtime.pthread_kill
      30ms  5.26% 59.65%       30ms  5.26%  runtime.madvise
      30ms  5.26% 64.91%       30ms  5.26%  runtime.pthread_cond_timedwait_relative_np
      20ms  3.51% 68.42%       20ms  3.51%  runtime.(*mspan).writeHeapBitsSmall
      10ms  1.75% 70.18%       10ms  1.75%  crypto/internal/bigmod.(*Nat).reset
      10ms  1.75% 71.93%       10ms  1.75%  k8s.io/apimachinery/pkg/runtime.setTargetKind
(pprof) exit
jiazha-mac:e2e jiazha$ go tool pprof mem.out 
File: e2e.test
Type: alloc_space
Time: Jan 27, 2025 at 2:09pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 10718.54kB, 61.67% of 17380.55kB total
Showing top 10 nodes out of 166
      flat  flat%   sum%        cum   cum%
 2048.12kB 11.78% 11.78%  2048.12kB 11.78%  path.Join
 1762.94kB 10.14% 21.93%  1762.94kB 10.14%  runtime/pprof.StartCPUProfile
 1536.56kB  8.84% 30.77%  1536.56kB  8.84%  golang.org/x/net/http2.(*ClientConn).roundTrip
 1065.48kB  6.13% 36.90%  2101.58kB 12.09%  k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypeWithName
 1025.38kB  5.90% 42.80%  1025.38kB  5.90%  sync.(*Pool).pinSlow
 1024.14kB  5.89% 48.69%  1536.17kB  8.84%  k8s.io/client-go/rest.(*Request).URL
  650.62kB  3.74% 52.43%   650.62kB  3.74%  compress/flate.(*compressor).init
  553.04kB  3.18% 55.62%   553.04kB  3.18%  github.com/gogo/protobuf/proto.RegisterType
  528.17kB  3.04% 58.65%   528.17kB  3.04%  regexp.(*bitState).reset
  524.09kB  3.02% 61.67%   524.09kB  3.02%  k8s.io/apimachinery/pkg/conversion.ConversionFuncs.AddUntyped
(pprof) exit

Reviewer Checklist

API Go Documentation
Tests: Unit Tests (and E2E Tests, if appropriate)
Comprehensive Commit Messages
Links to related GitHub Issue(s)

netlify · 2025-01-27T06:21:57Z

✅ Deploy Preview for olmv1 ready!

Name	Link
🔨 Latest commit	`17c9d51`
🔍 Latest deploy log	https://app.netlify.com/sites/olmv1/deploys/67adb658357e960008fa1987
😎 Deploy Preview	https://deploy-preview-1651--olmv1.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

codecov · 2025-01-27T06:36:36Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.45%. Comparing base (becde51) to head (17c9d51).
Report is 14 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1651   +/-   ##
=======================================
  Coverage   67.45%   67.45%           
=======================================
  Files          61       61           
  Lines        5245     5245           
=======================================
  Hits         3538     3538           
  Misses       1446     1446           
  Partials      261      261

Flag	Coverage Δ
e2e	`52.07% <ø> (-0.08%)`	⬇️
unit	`55.00% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jianzhangbjz · 2025-01-27T09:24:24Z

parallel run on IBMCloud, Cluster version is 4.18.0-0.nightly-2025-01-25-163410

jiazha-mac:e2e jiazha$ go test -run=^$ -bench=. -count=10 -memprofile=mem.out -cpuprofile=cpu.out
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
BenchmarkCreateClusterCatalog-10    	       1	2093042334 ns/op
BenchmarkCreateClusterCatalog-10    	       4	 611432146 ns/op
BenchmarkCreateClusterCatalog-10    	      10	 224809304 ns/op
BenchmarkCreateClusterCatalog-10    	      13	  92630189 ns/op
BenchmarkCreateClusterCatalog-10    	       6	 174559444 ns/op
BenchmarkCreateClusterCatalog-10    	      12	 107088038 ns/op
BenchmarkCreateClusterCatalog-10    	       1	1003581583 ns/op
BenchmarkCreateClusterCatalog-10    	       3	 379469264 ns/op
BenchmarkCreateClusterCatalog-10    	       2	 606936271 ns/op
BenchmarkCreateClusterCatalog-10    	       1	2773485917 ns/op
PASS
ok  	github.com/operator-framework/operator-controller/test/e2e	34.115s
jiazha-mac:e2e jiazha$ go tool pprof mem.out 
File: e2e.test
Type: alloc_space
Time: Jan 27, 2025 at 5:22pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 8433.33kB, 62.21% of 13555.35kB total
Showing top 10 nodes out of 126
      flat  flat%   sum%        cum   cum%
 1536.16kB 11.33% 11.33%  1536.16kB 11.33%  golang.org/x/net/http2.(*ClientConn).roundTrip
 1536.09kB 11.33% 22.66%  1536.09kB 11.33%  path.Join
 1184.27kB  8.74% 31.40%  1184.27kB  8.74%  runtime/pprof.StartCPUProfile
  902.59kB  6.66% 38.06%  1553.21kB 11.46%  compress/flate.NewWriter
  650.62kB  4.80% 42.86%   650.62kB  4.80%  compress/flate.(*compressor).init
  553.04kB  4.08% 46.94%   553.04kB  4.08%  github.com/gogo/protobuf/proto.RegisterType
  528.17kB  3.90% 50.84%   528.17kB  3.90%  regexp.(*bitState).reset
  516.01kB  3.81% 54.64%   516.01kB  3.81%  google.golang.org/protobuf/internal/filedesc.(*File).initDecls
  513.69kB  3.79% 58.43%   513.69kB  3.79%  regexp.mergeRuneSets.func2
  512.69kB  3.78% 62.21%   512.69kB  3.78%  regexp/syntax.(*compiler).inst
(pprof) exit
jiazha-mac:e2e jiazha$ go tool pprof cpu.out 
File: e2e.test
Type: cpu
Time: Jan 27, 2025 at 5:22pm (CST)
Duration: 33.19s, Total samples = 380ms ( 1.14%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 310ms, 81.58% of 380ms total
Showing top 10 nodes out of 154
      flat  flat%   sum%        cum   cum%
      90ms 23.68% 23.68%       90ms 23.68%  runtime.kevent
      50ms 13.16% 36.84%       50ms 13.16%  runtime.pthread_cond_signal
      50ms 13.16% 50.00%       50ms 13.16%  syscall.syscall
      40ms 10.53% 60.53%       40ms 10.53%  runtime.pthread_cond_wait
      30ms  7.89% 68.42%       60ms 15.79%  runtime.scanobject
      10ms  2.63% 71.05%       20ms  5.26%  k8s.io/client-go/rest.(*Request).tryThrottleWithInfo
      10ms  2.63% 73.68%       10ms  2.63%  runtime.(*itabTableType).find
      10ms  2.63% 76.32%       10ms  2.63%  runtime.(*mheap).allocSpan
      10ms  2.63% 78.95%       10ms  2.63%  runtime.(*mspan).heapBitsSmallForAddr
      10ms  2.63% 81.58%       10ms  2.63%  runtime.(*unwinder).resolveInternal
(pprof) exit

joelanford · 2025-01-28T15:29:48Z

test/e2e/benchmark_test.go

+}
+
+// GetRandomString generates a random string of the given length
+func getRandomString(length int) string {


k8s.io/apimachinery has a rand.String function that can be used instead of us maintaining a separate implementation. https://pkg.go.dev/k8s.io/apimachinery/pkg/util/rand#String

joelanford · 2025-01-28T15:36:02Z

test/e2e/benchmark_test.go

+	"time"
+)
+
+func BenchmarkCreateClusterCatalog(b *testing.B) {


I'm not sure this benchmark tells us a whole lot because it is mostly measuring:

client CPU/memory used to submit a CREATE request to the apiserver

client CPU/memory used to submit a DELETE request to the apiserver

But it notably isn't able to account for the resources or time spent by the controller to reconcile the ClusterCatalog, which I think is what we actually want to measure.

But regardless, in my mind the biggest open question in is NOT "what machinery do we use to take measurements?". The real question is "where/how do we store/retrieve historical measurements that we can use as a baseline for comparison?"

The real question is "where/how do we store/retrieve historical measurements that we can use as a baseline for comparison?"

~~The test results will be stored on the GCS bucket if we run these test cases in Prow CI. And then, we can use Big Query to retrieve historical measurements. Similar to Sippy.~~
I think it's a little bit complex. How about storing the baseline in benchmarks/baseline.txt? The initial benchmark results as the baseline? Once a new version is released, update it.

But it notably isn't able to account for the resources or time spent by the controller to reconcile the ClusterCatalog, which I think is what we actually want to measure.

Yes, however, I was trying to cover the "real world" case instead of the unit test. Do we need to benchmark some functions? Similar to the unit test?

How about storing the baseline in benchmarks/baseline.txt?

Maybe? But if, for example, we wanted to build a histogram of response times of the catalogd web server over multiple commits and/or CI runs in order to get a smoothed out baseline, that seems hard if a file is committed to the repo because we'd have to update it often.

Maybe another option would be to make use of github's upload-artifact/download-artifact actions? If an artifact can be uploaded/download and shared between GH actions runs, then maybe we could do something like:

download a snapshotted prometheus DB

run prometheus with the DB, and configure scarping with a short interval to pull metrics from our components into the DB

verify via promql that we haven't broken our thresholds, if we have fail the run

snapshot the updated prometheus DB

upload the snapshot

success!

Cool! If so, we don't need to use the benchstat, right?

I'm trying to use the Prometheus.

From controller-runtime we have those metrics which seem relevant for performance, see:

Reconciliation Time → controller_runtime_reconcile_time_seconds_sum

HTTP Request Duration → catalogd_http_request_duration_seconds_sum

Number of Reconciliations → controller_runtime_reconcile_total

CPU Usage → go_cpu_classes_gc_total_cpu_seconds_total

Memory Usage → go_cpu_classes_idle_cpu_seconds_total

Failed reconciliations → controller_runtime_reconcile_errors_total

Webhook failures → controller_runtime_webhook_latency_seconds

Here is an example of all metrics captured for the tests under e2e suite: https://github.com/operator-framework/operator-controller/actions/runs/13758503772/job/38469750785?pr=1856

So, should we not use the metrics that we have already, which are designed for Operators instead?

Do we need a snapshot of each test?
Also, we will need to check it for oper-con and catalogd?
IF we store all possible data and not only what is relevant to us, how hard will it be to analyse the use cases?

IHMO, I think we need to just query what is relevant instead of use snapshots;
Such as:

curl -G http://prometheus-server/api/v1/query \ --data-urlencode 'query=rate(controller_runtime_reconcile_time_seconds_sum[5m])' \ -o results/reconciliation_time.txt

Then, we store only what matters for us.

I tried to POC it: #1856
You can check the artefacts at: https://github.com/operator-framework/operator-controller/actions/runs/13761934158/job/38479819934?pr=1856

But see that for the tests they are accumulative
So, for performance tests, we would need to start a new cluster for each?

We know how to collect those metrics, but where to store them? And, each CI(Upstream based on Kind, the Downstream based on OCP) will start a cluster, otherwise, how to run the e2e test case without cluster?

jianzhangbjz · 2025-02-05T10:03:46Z

Test on 4.19.0-0.nightly-2025-02-04-230011, GCP.

jiazha-mac:e2e jiazha$ export CATALOG_IMG=registry.redhat.io/redhat/redhat-operator-index:v4.18
jiazha-mac:e2e jiazha$ go test -run=^$ -bench=. -count=10 -memprofile=mem.out -cpuprofile=cpu.out
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
BenchmarkCreateClusterCatalog-10    	       1	1274109792 ns/op
BenchmarkCreateClusterCatalog-10    	      18	  85367590 ns/op
BenchmarkCreateClusterCatalog-10    	      21	  98587806 ns/op
BenchmarkCreateClusterCatalog-10    	      20	  80611660 ns/op
BenchmarkCreateClusterCatalog-10    	      15	  95789072 ns/op
BenchmarkCreateClusterCatalog-10    	      15	  87705625 ns/op
BenchmarkCreateClusterCatalog-10    	      14	 186946500 ns/op
BenchmarkCreateClusterCatalog-10    	      12	 166680833 ns/op
BenchmarkCreateClusterCatalog-10    	       9	 177902880 ns/op
BenchmarkCreateClusterCatalog-10    	       4	 447856864 ns/op
PASS
ok  	github.com/operator-framework/operator-controller/test/e2e	38.205s

joelanford · 2025-02-07T16:01:00Z

.github/workflows/benchmark.yaml

+      - name: Compare with baseline
+        run: |
+          go install golang.org/x/perf/cmd/benchstat@latest
+          benchstat benchmarks/baseline.txt new.txt


benchstat is nice to get comparisons, but we'd have to be careful to somehow generate and commit a baseline from GH CI's system to avoid issues like "Jian's laptop is beefier than Joe's, and both are beefier than a GHA VM"

joelanford · 2025-02-07T16:03:03Z

I really think we need to get a Brief and/or RFC written up to get agreement and consensus on the approaches before we merge stuff. But I also think the kind of prototyping and brainstorming that you're doing here and that @OchiengEd did will be necessary to get to the point that we are confident in our approach.

jianzhangbjz · 2025-02-08T08:13:58Z

I really think we need to get a Brief and/or RFC written up to get agreement and consensus on the approaches before we merge stuff.

A Brief is drafting here: [WIP]Brief: Benchmarking test OLMv1

jianzhangbjz · 2025-02-10T08:22:12Z

Benchmark test pass: https://github.com/operator-framework/operator-controller/actions/runs/13214623057/job/36892420269?pr=1651
download-artifact:

jiazha-mac:~ jiazha$ tree Downloads/benchmark-artifacts/
Downloads/benchmark-artifacts/
├── new.txt
└── output

1 directory, 2 files
jiazha-mac:~ jiazha$ cat Downloads/benchmark-artifacts/new.txt 
goos: linux
goarch: amd64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkCreateClusterCatalog
BenchmarkCreateClusterCatalog-4   	      81	  82695425 ns/op	   36570 B/op	     397 allocs/op
BenchmarkCreateClusterCatalog-4   	      12	  99872913 ns/op	   37266 B/op	     404 allocs/op
BenchmarkCreateClusterCatalog-4   	      12	  99852972 ns/op	   37327 B/op	     402 allocs/op
BenchmarkCreateClusterCatalog-4   	      12	  99882173 ns/op	   37409 B/op	     405 allocs/op
BenchmarkCreateClusterCatalog-4   	      12	 100024048 ns/op	   37350 B/op	     405 allocs/op
BenchmarkCreateClusterCatalog-4   	      12	 100098746 ns/op	   37568 B/op	     406 allocs/op
BenchmarkCreateClusterCatalog-4   	      12	 100037742 ns/op	   38134 B/op	     405 allocs/op
BenchmarkCreateClusterCatalog-4   	      12	  99984867 ns/op	   37121 B/op	     403 allocs/op
BenchmarkCreateClusterCatalog-4   	      12	  99855796 ns/op	   38886 B/op	     406 allocs/op
BenchmarkCreateClusterCatalog-4   	      12	  99946190 ns/op	   38851 B/op	     404 allocs/op
PASS
ok  	github.com/operator-framework/operator-controller/test/e2e	20.427s
jiazha-mac:~ jiazha$ cat Downloads/benchmark-artifacts/output 
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
                        │ benchmarks/baseline.txt │
                        │         sec/op          │
CreateClusterCatalog-10              85.11m ± 16%

                        │ benchmarks/baseline.txt │
                        │          B/op           │
CreateClusterCatalog-10              36.21Ki ± 6%

                        │ benchmarks/baseline.txt │
                        │        allocs/op        │
CreateClusterCatalog-10                394.5 ± 1%

goos: linux
goarch: amd64
cpu: AMD EPYC 7763 64-Core Processor                
                       │ /tmp/artifacts/new.txt │
                       │         sec/op         │
CreateClusterCatalog-4              99.91m ± 0%

                       │ /tmp/artifacts/new.txt │
                       │          B/op          │
CreateClusterCatalog-4             36.50Ki ± 4%

                       │ /tmp/artifacts/new.txt │
                       │       allocs/op        │
CreateClusterCatalog-4               404.5 ± 1%

jianzhangbjz · 2025-02-13T09:30:47Z

This benchmark pipeline logic as follows:

run-benchmark, here is the running log: https://github.com/operator-framework/operator-controller/actions/runs/13304141348/job/37151204944?pr=1651

Run benchmark test cases by using the go test -v -run=^$$ -bench=. -benchmem -count=10 -v ./test/e2e/...
Convert the test results to Prometheus metrics
Upload this metrics file by using action artifacts

run-prometheus, here is the running log: https://github.com/operator-framework/operator-controller/actions/runs/13304141348/job/37151497828?pr=1651

Download the previous job's Prometheus DB across jobs.(ToDo: make it across repos by using the GitHub REST API)
Extract and Restore Prometheus Snapshot
Set Up Prometheus Config to listen to $HOST_IP:9000.

          cat << EOF > prometheus.yml
          global:
            scrape_interval: 5s
          scrape_configs:
            - job_name: 'benchmark_metrics'
              static_configs:
                - targets: ['$HOST_IP:9000']
          EOF

Run Prometheus in container

          docker run -d --name prometheus -p 9090:9090 \
            --user=root \
            -v ${{ github.workspace }}/prometheus.yml:/etc/prometheus/prometheus.yml \
            -v ${{ github.workspace }}/prometheus-data:/prometheus \
            prom/prometheus --config.file=/etc/prometheus/prometheus.yml \
            --storage.tsdb.path=/prometheus \
            --storage.tsdb.retention.time=1h \
            --web.enable-admin-api

Start HTTP Server to Expose Metrics (Prometheus grabs the metrics and store them to its tsdb).
Check Benchmark Metrics Against Threshold.
For the Threshold, we can update them to the appropriate value after running days.
For query metrics, we can add more with the benchmark test cases increasing.

          MAX_TIME_NS=1200000000  # 1.2s
          MAX_ALLOCS=4000
          MAX_MEM_BYTES=450000

          # Query Prometheus Metrics, get the max value
          time_ns=$(curl -s "http://localhost:9090/api/v1/query?query=max(benchmark_createclustercatalog_ns)" | jq -r '.data.result[0].value[1]')
          allocs=$(curl -s "http://localhost:9090/api/v1/query?query=max(benchmark_createclustercatalog_allocs)" | jq -r '.data.result[0].value[1]')
          mem_bytes=$(curl -s "http://localhost:9090/api/v1/query?query=max(benchmark_createclustercatalog_mem_bytes)" | jq -r '.data.result[0].value[1]')

Find and Upload Prometheus Snapshot
Stop Prometheus
Upload Prometheus Snapshot
Done

Hi @joelanford , I have implemented the logic you suggested above, could you help have a review when you get a chance? Thanks!

jianzhangbjz · 2025-02-13T09:46:26Z

I download the snapshot from https://github.com/operator-framework/operator-controller/actions/runs/13304141348
and check them in Prometheus. It works as expected. As follows,

jiazha-mac:prometheus-3.1.0.darwin-arm64 jiazha$ tree data
data
├── 01JHSRNJT32YH9858HS49WKGNS
│   ├── chunks
│   │   └── 000001
│   ├── index
│   ├── meta.json
│   └── tombstones
├── 01JKZ7W6XKK9WQ4B9JM5GCNZMW
│   ├── chunks
│   │   └── 000001
│   ├── index
│   ├── meta.json
│   ├── tmp_dbro_sandbox3927239696
│   └── tombstones
├── 01JKZ9ZF06MSA9VCPFBRSRWB4J
│   ├── chunks
│   │   └── 000001
│   ├── index
│   ├── meta.json
│   └── tombstones
├── chunks_head
├── prometheus.tar
├── queries.active
└── wal
    ├── 00000000
    ├── 00000001
    └── 00000002

10 directories, 17 files

jiazha-mac:prometheus-3.1.0.darwin-arm64 jiazha$ ./prometheus
time=2025-02-13T09:38:09.214Z level=INFO source=main.go:636 msg="No time or size retention was set so using the default time retention" duration=15d
time=2025-02-13T09:38:09.214Z level=INFO source=main.go:683 msg="Starting Prometheus Server" mode=server version="(version=3.1.0, branch=HEAD, revision=7086161a93b262aa0949dbf2aba15a5a7b13e0a3)"
...

http://localhost:9090/query?g0.expr=benchmark_createclustercatalog_allocs&g0.show_tree=0&g0.tab=graph&g0.range_input=1h&g0.res_type=auto&g0.res_density=medium&g0.display_mode=lines&g0.show_exemplars=0

openshift-merge-robot · 2025-02-18T15:41:25Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jianzhangbjz requested a review from a team as a code owner January 27, 2025 06:21

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 27, 2025

joelanford reviewed Jan 28, 2025

View reviewed changes

joelanford reviewed Feb 7, 2025

View reviewed changes

jianzhangbjz force-pushed the benchmark branch 3 times, most recently from 3cde584 to e0e379d Compare February 8, 2025 03:55

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 8, 2025

jianzhangbjz force-pushed the benchmark branch from e0e379d to 6ea7de2 Compare February 8, 2025 03:59

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 8, 2025

jianzhangbjz force-pushed the benchmark branch 6 times, most recently from dd32769 to 8b1afa9 Compare February 8, 2025 08:10

jianzhangbjz force-pushed the benchmark branch 2 times, most recently from ff8faec to 08a24d1 Compare February 8, 2025 10:06

jianzhangbjz changed the title ~~[WIP] benchmark createClusterCatalog func~~ 🌱 [WIP] benchmark createClusterCatalog func Feb 10, 2025

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 10, 2025

jianzhangbjz force-pushed the benchmark branch 2 times, most recently from 521a49f to 9305638 Compare February 10, 2025 10:01

jianzhangbjz force-pushed the benchmark branch from 88d2518 to dba035d Compare February 11, 2025 09:18

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 11, 2025

jianzhangbjz force-pushed the benchmark branch from dba035d to 982fa8c Compare February 11, 2025 09:21

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 11, 2025

jianzhangbjz changed the title ~~🌱 [WIP] benchmark createClusterCatalog func~~ 🌱 [WIP] add benchmark pipeline Feb 11, 2025

jianzhangbjz force-pushed the benchmark branch 15 times, most recently from a1d1b5a to 24b9450 Compare February 13, 2025 08:41

jianzhangbjz added 4 commits February 13, 2025 17:07

benchmark create ClusterCatalog func

5f24a60

use rand.String

30c04ba

add benchmark pipeline

6e31e6d

use prometheus instead of benchstat

17c9d51

jianzhangbjz force-pushed the benchmark branch from 24b9450 to 17c9d51 Compare February 13, 2025 09:07

jianzhangbjz changed the title ~~🌱 [WIP] add benchmark pipeline~~ 🌱 add benchmark pipeline Feb 13, 2025

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌱 add benchmark pipeline #1651

🌱 add benchmark pipeline #1651

jianzhangbjz commented Jan 27, 2025 •

edited

Loading

netlify bot commented Jan 27, 2025 •

edited

Loading

codecov bot commented Jan 27, 2025 •

edited

Loading

jianzhangbjz commented Jan 27, 2025 •

edited

Loading

joelanford Jan 28, 2025

joelanford Jan 28, 2025

jianzhangbjz Feb 5, 2025 •

edited

Loading

jianzhangbjz Feb 6, 2025

joelanford Feb 7, 2025 •

edited

Loading

jianzhangbjz Feb 8, 2025 •

edited

Loading

jianzhangbjz Feb 10, 2025

camilamacedo86 Mar 10, 2025 •

edited

Loading

camilamacedo86 Mar 10, 2025 •

edited

Loading

jianzhangbjz Mar 11, 2025

jianzhangbjz commented Feb 5, 2025

joelanford Feb 7, 2025

jianzhangbjz Feb 10, 2025

joelanford commented Feb 7, 2025

jianzhangbjz commented Feb 8, 2025

jianzhangbjz commented Feb 10, 2025

jianzhangbjz commented Feb 13, 2025

jianzhangbjz commented Feb 13, 2025

openshift-merge-robot commented Feb 18, 2025

🌱 add benchmark pipeline #1651

Are you sure you want to change the base?

🌱 add benchmark pipeline #1651

Conversation

jianzhangbjz commented Jan 27, 2025 • edited Loading

Description

Reviewer Checklist

netlify bot commented Jan 27, 2025 • edited Loading

✅ Deploy Preview for olmv1 ready!

codecov bot commented Jan 27, 2025 • edited Loading

Codecov Report

jianzhangbjz commented Jan 27, 2025 • edited Loading

joelanford Jan 28, 2025

Choose a reason for hiding this comment

joelanford Jan 28, 2025

Choose a reason for hiding this comment

jianzhangbjz Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

jianzhangbjz Feb 6, 2025

Choose a reason for hiding this comment

joelanford Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

jianzhangbjz Feb 8, 2025 • edited Loading

Choose a reason for hiding this comment

jianzhangbjz Feb 10, 2025

Choose a reason for hiding this comment

camilamacedo86 Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

camilamacedo86 Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

jianzhangbjz Mar 11, 2025

Choose a reason for hiding this comment

jianzhangbjz commented Feb 5, 2025

joelanford Feb 7, 2025

Choose a reason for hiding this comment

jianzhangbjz Feb 10, 2025

Choose a reason for hiding this comment

joelanford commented Feb 7, 2025

jianzhangbjz commented Feb 8, 2025

jianzhangbjz commented Feb 10, 2025

jianzhangbjz commented Feb 13, 2025

jianzhangbjz commented Feb 13, 2025

openshift-merge-robot commented Feb 18, 2025

jianzhangbjz commented Jan 27, 2025 •

edited

Loading

netlify bot commented Jan 27, 2025 •

edited

Loading

codecov bot commented Jan 27, 2025 •

edited

Loading

jianzhangbjz commented Jan 27, 2025 •

edited

Loading

jianzhangbjz Feb 5, 2025 •

edited

Loading

joelanford Feb 7, 2025 •

edited

Loading

jianzhangbjz Feb 8, 2025 •

edited

Loading

camilamacedo86 Mar 10, 2025 •

edited

Loading

camilamacedo86 Mar 10, 2025 •

edited

Loading