⚡️ Speed up function nms_supercells by 519%#45
Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
Open
⚡️ Speed up function nms_supercells by 519%#45codeflash-ai[bot] wants to merge 1 commit intomainfrom
nms_supercells by 519%#45codeflash-ai[bot] wants to merge 1 commit intomainfrom
Conversation
The optimized code achieves a **518% speedup** by addressing the quadratic computational overhead in `nms_supercells` through two key optimizations: ## Primary Optimization: Early Overlap Detection (O(N²) → O(N²) with fast path) **What changed:** The optimized code precomputes `row_sets` and `col_sets` as Python sets for all supercells upfront, then uses fast set intersection checks (`row_set2 & row_sets[supercell1_num]`) to skip pairs that cannot possibly overlap before calling the expensive `remove_supercell_overlap` function. **Why this is faster:** In the original code, every pair of supercells (157,238 pairs in the profiler results) called `remove_supercell_overlap`, which immediately created sets from lists and computed intersections. The line profiler shows this function consumed 1.39s out of 1.52s total (91.6% of runtime). In the optimized version, the early check filters out 155,637 non-overlapping pairs with cheap set operations, reducing `remove_supercell_overlap` calls from 157,238 to just 1,601 (a **98% reduction**). The function's total time drops from 509ms to 17.9ms. This explains why test cases with many non-overlapping supercells see dramatic improvements: - `test_many_non_overlapping_supercells`: 1.24ms → 221μs (459% faster) - `test_large_supercells_list_with_few_overlaps`: 19.1ms → 2.70ms (608% faster) - `test_performance_with_moderate_scale`: 115ms → 17.5ms (562% faster) **Trade-off:** Test cases with dense overlaps (where most pairs actually overlap) show modest slowdowns (10-17%) because they pay the overhead of set precomputation and the additional intersection check without gaining many skips. For example, `test_identical_supercells` is 10% slower because both supercells overlap completely, so the early check doesn't help. ## Secondary Optimization: Reduced Dictionary Lookups **What changed:** Inside `remove_supercell_overlap`, the optimized code stores `supercell1/2["row_numbers"]` and `supercell1/2["column_numbers"]` in local variables (`rows1`, `rows2`, `cols1`, `cols2`) to avoid repeated dictionary key lookups in the while loop. **Why this matters:** Dictionary lookups in Python have overhead, and the original code performed them on every loop iteration when checking lengths and calling `min()/max()`. By caching these references, the optimized version reduces per-iteration overhead, contributing to the ~28x speedup in `remove_supercell_overlap` (509ms → 17.9ms). ## Performance Characteristics The optimization is **highly effective for workloads with sparse overlaps** (the common case in Non-Maximum Suppression for real table detection), where most supercell pairs don't overlap. It works well when: - Many supercells are spatially separated (no shared rows/columns) - The grid is large relative to supercell size - Only a small fraction of pairs have actual overlap It's **less beneficial** when: - All supercells overlap the same region (dense overlaps) - Working with very small supercell counts (<10) where setup overhead dominates - Single-cell or tiny supercells where overlap checks are already fast The optimization doesn't change algorithmic complexity but dramatically reduces the constant factor by avoiding expensive operations on non-overlapping pairs—a classic "early exit" pattern that proves very effective for this NMS use case.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 519% (5.19x) speedup for
nms_supercellsinunstructured_inference/models/table_postprocess.py⏱️ Runtime :
148 milliseconds→23.9 milliseconds(best of90runs)📝 Explanation and details
The optimized code achieves a 518% speedup by addressing the quadratic computational overhead in
nms_supercellsthrough two key optimizations:Primary Optimization: Early Overlap Detection (O(N²) → O(N²) with fast path)
What changed: The optimized code precomputes
row_setsandcol_setsas Python sets for all supercells upfront, then uses fast set intersection checks (row_set2 & row_sets[supercell1_num]) to skip pairs that cannot possibly overlap before calling the expensiveremove_supercell_overlapfunction.Why this is faster: In the original code, every pair of supercells (157,238 pairs in the profiler results) called
remove_supercell_overlap, which immediately created sets from lists and computed intersections. The line profiler shows this function consumed 1.39s out of 1.52s total (91.6% of runtime).In the optimized version, the early check filters out 155,637 non-overlapping pairs with cheap set operations, reducing
remove_supercell_overlapcalls from 157,238 to just 1,601 (a 98% reduction). The function's total time drops from 509ms to 17.9ms. This explains why test cases with many non-overlapping supercells see dramatic improvements:test_many_non_overlapping_supercells: 1.24ms → 221μs (459% faster)test_large_supercells_list_with_few_overlaps: 19.1ms → 2.70ms (608% faster)test_performance_with_moderate_scale: 115ms → 17.5ms (562% faster)Trade-off: Test cases with dense overlaps (where most pairs actually overlap) show modest slowdowns (10-17%) because they pay the overhead of set precomputation and the additional intersection check without gaining many skips. For example,
test_identical_supercellsis 10% slower because both supercells overlap completely, so the early check doesn't help.Secondary Optimization: Reduced Dictionary Lookups
What changed: Inside
remove_supercell_overlap, the optimized code storessupercell1/2["row_numbers"]andsupercell1/2["column_numbers"]in local variables (rows1,rows2,cols1,cols2) to avoid repeated dictionary key lookups in the while loop.Why this matters: Dictionary lookups in Python have overhead, and the original code performed them on every loop iteration when checking lengths and calling
min()/max(). By caching these references, the optimized version reduces per-iteration overhead, contributing to the ~28x speedup inremove_supercell_overlap(509ms → 17.9ms).Performance Characteristics
The optimization is highly effective for workloads with sparse overlaps (the common case in Non-Maximum Suppression for real table detection), where most supercell pairs don't overlap. It works well when:
It's less beneficial when:
The optimization doesn't change algorithmic complexity but dramatically reduces the constant factor by avoiding expensive operations on non-overlapping pairs—a classic "early exit" pattern that proves very effective for this NMS use case.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-nms_supercells-mkouh7zeand push.