Skip to content

Improve intersect_by_rank performance#7744

Draft
robert3005 wants to merge 2 commits intodevelopfrom
rk/intersect-by-rank
Draft

Improve intersect_by_rank performance#7744
robert3005 wants to merge 2 commits intodevelopfrom
rk/intersect-by-rank

Conversation

@robert3005
Copy link
Copy Markdown
Contributor

We never spent time and this is useful for merging selections and filters in
scans

Signed-off-by: Robert Kruszewski github@robertk.io

@robert3005
Copy link
Copy Markdown
Contributor Author

I will go over this tomorrow @joseph-isaacs I looked at #7098 and #7393 which both optimised slightly different cases of this function. I tried to combine the two.

@robert3005 robert3005 added the changelog/performance A performance improvement label May 1, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 1, 2026

Merging this PR will improve performance by ×19

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 25 improved benchmarks
✅ 1181 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime cuda/bitpacked_u8/unpack/3bw[100M] 352.3 µs 300.4 µs +17.24%
Simulation density_matrix[(0.05, 0.05, "self_sparse_mask_sparse")] 83 µs 47.6 µs +74.43%
Simulation density_matrix[(0.5, 0.05, "self_dense_mask_sparse")] 482.5 µs 53.2 µs ×9.1
Simulation intersect_by_rank[(10000, "random")] 103.6 µs 10.3 µs ×10
Simulation intersect_by_rank[(100000, "random")] 979.4 µs 53 µs ×18
Simulation density_matrix[(0.05, 0.5, "self_sparse_mask_dense")] 131.8 µs 47.6 µs ×2.8
Simulation density_matrix[(0.5, 0.5, "self_dense_mask_dense")] 979 µs 52.8 µs ×19
Simulation intersect_by_rank[(10000, "runs")] 103.6 µs 10.1 µs ×10
Simulation intersect_by_rank[(100000, "runs")] 976.8 µs 53 µs ×18
Simulation rank_indices[(0.05, 0.05, "self_sparse_rank_sparse")] 80.9 µs 43 µs +87.87%
Simulation rank_indices[(0.5, 0.01, "self_dense_rank_very_sparse")] 427.9 µs 58.8 µs ×7.3
Simulation rank_indices[(0.5, 0.5, "self_dense_rank_dense")] 867.5 µs 53.4 µs ×16
Simulation sparse[(100000, 0.05, "sparse_5pct")] 132.1 µs 47.8 µs ×2.8
Simulation sparse[(100000, 0.5, "dense_50pct")] 979.7 µs 53.2 µs ×18
Simulation very_sparse_mask_cached[(0.5, 0.005, "self_dense_mask_0p5pct")] 422.3 µs 50.6 µs ×8.4
Simulation very_sparse_mask_cached[(0.5, 0.02, "self_dense_mask_2pct")] 435.5 µs 73.7 µs ×5.9
Simulation very_sparse_mask_uncached[(0.5, 0.005, "self_dense_mask_0p5pct")] 432.3 µs 59.5 µs ×7.3
Simulation very_sparse_mask_uncached[(0.5, 0.02, "self_dense_mask_2pct")] 449 µs 82.1 µs ×5.5
Simulation rank_indices[(0.05, 0.5, "self_sparse_rank_dense")] 120.1 µs 47.4 µs ×2.5
Simulation rank_indices[(0.5, 0.05, "self_dense_rank_sparse")] 462.6 µs 58.6 µs ×7.9
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing rk/intersect-by-rank (a8488fc) with develop (f307edc)

Open in CodSpeed

@robert3005 robert3005 force-pushed the rk/intersect-by-rank branch 4 times, most recently from 89f58dc to 18f8d68 Compare May 6, 2026 08:12
robert3005 added 2 commits May 6, 2026 13:25
Signed-off-by: Robert Kruszewski <github@robertk.io>
Signed-off-by: Robert Kruszewski <github@robertk.io>
@robert3005 robert3005 force-pushed the rk/intersect-by-rank branch from 18f8d68 to a8488fc Compare May 6, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/performance A performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant