Open
Conversation
The optimized code achieves a **155% speedup** (1.43ms → 558μs) by eliminating object allocations and reducing function call overhead in the `overlaps` function—the primary performance bottleneck. ## Key Optimizations **1. Inlined Intersection Logic in `overlaps`** - **Original**: Created two `Rect` objects, called `get_area()` twice, and `intersect()` once per invocation - **Optimized**: Computes bbox areas and intersection area using direct arithmetic on list elements - **Impact**: Eliminates ~3 object allocations and ~4 method calls per `overlaps()` invocation - **Why faster**: Python object creation and attribute access (`self.x_min`, etc.) are expensive compared to local variable arithmetic. The line profiler shows the original `overlaps` spent 69% of its time in `rect1.intersect(...)` alone. **2. Streamlined `Rect.get_area()`** - **Original**: Computed `area = (x_max - x_min) * (y_max - y_min)`, then checked `area > 0` - **Optimized**: Computes dimensions first (`dx`, `dy`), checks both `> 0` before multiplying - **Why faster**: Avoids multiplication when dimensions are non-positive, and the short-circuit evaluation (`dx > 0 and dy > 0`) exits early for degenerate rectangles **3. Optimized `Rect.intersect()` Logic** - **Original**: Called `get_area()` twice (lines 25 and 34 in profiler), used `max()`/`min()` built-ins - **Optimized**: Pre-computes dimensions once, uses ternary comparisons (`a if a >= b else b`) instead of `max()/min()` - **Why faster**: Avoids repeated attribute access in `get_area()` and replaces function calls with faster inline comparisons ## Performance Evidence From annotated tests, the optimization excels at: - **High-frequency scenarios**: The `get_bbox_span_subset` reference shows `overlaps()` called in a loop over spans, making per-call savings compound significantly - **Typical overlap checks**: Tests with normal bboxes show 119-158% speedups (e.g., `test_identical_bboxes_full_overlap_default_threshold`: 7.13μs → 2.95μs) - **Edge cases**: Even degenerate cases (zero-area bboxes) benefit from early exits (e.g., `test_zero_area_bbox1_returns_false`: 3.15μs → 1.84μs, 72% faster) ## Impact on Workloads Given the `get_bbox_span_subset` reference, this function operates in a **hot path** where it filters spans against bounding boxes. The optimization is particularly valuable when: - Processing tables with many text spans (each span tested for overlap) - High `threshold` values that reject most candidates (early arithmetic checks avoid object creation overhead) - Dense layouts with frequent partial overlaps (where intersection area calculation dominates) The test suite shows consistent 100-175% speedups across all scenarios, indicating the optimization is robust for diverse input patterns.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 155% (1.55x) speedup for
overlapsinunstructured_inference/models/table_postprocess.py⏱️ Runtime :
1.43 milliseconds→558 microseconds(best of170runs)📝 Explanation and details
The optimized code achieves a 155% speedup (1.43ms → 558μs) by eliminating object allocations and reducing function call overhead in the
overlapsfunction—the primary performance bottleneck.Key Optimizations
1. Inlined Intersection Logic in
overlapsRectobjects, calledget_area()twice, andintersect()once per invocationoverlaps()invocationself.x_min, etc.) are expensive compared to local variable arithmetic. The line profiler shows the originaloverlapsspent 69% of its time inrect1.intersect(...)alone.2. Streamlined
Rect.get_area()area = (x_max - x_min) * (y_max - y_min), then checkedarea > 0dx,dy), checks both> 0before multiplyingdx > 0 and dy > 0) exits early for degenerate rectangles3. Optimized
Rect.intersect()Logicget_area()twice (lines 25 and 34 in profiler), usedmax()/min()built-insa if a >= b else b) instead ofmax()/min()get_area()and replaces function calls with faster inline comparisonsPerformance Evidence
From annotated tests, the optimization excels at:
get_bbox_span_subsetreference showsoverlaps()called in a loop over spans, making per-call savings compound significantlytest_identical_bboxes_full_overlap_default_threshold: 7.13μs → 2.95μs)test_zero_area_bbox1_returns_false: 3.15μs → 1.84μs, 72% faster)Impact on Workloads
Given the
get_bbox_span_subsetreference, this function operates in a hot path where it filters spans against bounding boxes. The optimization is particularly valuable when:thresholdvalues that reject most candidates (early arithmetic checks avoid object creation overhead)The test suite shows consistent 100-175% speedups across all scenarios, indicating the optimization is robust for diverse input patterns.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-overlaps-mkosz796and push.