Open
Conversation
The optimized code achieves a **305% speedup** by eliminating expensive object allocations and method calls in the performance-critical nested loop of the `nms` function. ## Key Optimizations **1. Precomputed Bounding Boxes and Areas** Instead of constructing `Rect` objects repeatedly in the inner loop (which happened 171,577 times), the optimized version precomputes all bounding boxes and areas once upfront: ```python bboxes = [obj["bbox"] for obj in objects] areas = [computed areas...] ``` This eliminates ~350,000 `Rect.__init__` calls and ~900,000 `get_area()` calls based on the profiler data. **2. Inline Intersection Calculation** The original code's `Rect.intersect()` method involved: - Multiple attribute assignments - Multiple `get_area()` calls - Object state mutations The optimized version computes intersection area directly using tuple unpacking and simple arithmetic: ```python inter_w = min(x1_max, x2_max) - max(x1_min, x2_min) inter_h = min(y1_max, y2_max) - max(y1_min, y2_min) ``` This cuts intersection overhead from ~1.76 seconds to effectively zero as separate operations. **3. Early Exit on Width Check** The optimized code checks `inter_w <= 0` before computing height, allowing early termination when boxes don't overlap horizontally. This optimization particularly benefits test cases with non-overlapping or grid-layout objects. ## Impact Analysis Based on `function_references`, `nms` is called in **hot paths** during table structure refinement: - `refine_rows()`: Called when processing detected table rows - `refine_columns()`: Called when processing detected table columns Both functions call `nms` as a fallback when token-based refinement isn't applicable, making this optimization critical for table extraction pipelines that process many documents. ## Test Case Performance The optimization shows consistent gains across all scenarios: - **Small inputs** (2-3 objects): 23-40% faster due to reduced overhead - **Medium inputs** (100 objects): 248-325% faster as the O(n²) loop benefits compound - **Large inputs** (500 objects): 309% faster, with runtime dropping from 295ms to 72ms - **Dense clusters**: 238% faster as intersection calculations dominate The speedup scales particularly well for workloads with many objects or high overlap density, which are common in table detection scenarios where multiple overlapping bounding boxes are typical.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 306% (3.06x) speedup for
nmsinunstructured_inference/models/table_postprocess.py⏱️ Runtime :
410 milliseconds→101 milliseconds(best of49runs)📝 Explanation and details
The optimized code achieves a 305% speedup by eliminating expensive object allocations and method calls in the performance-critical nested loop of the
nmsfunction.Key Optimizations
1. Precomputed Bounding Boxes and Areas
Instead of constructing
Rectobjects repeatedly in the inner loop (which happened 171,577 times), the optimized version precomputes all bounding boxes and areas once upfront:This eliminates ~350,000
Rect.__init__calls and ~900,000get_area()calls based on the profiler data.2. Inline Intersection Calculation
The original code's
Rect.intersect()method involved:get_area()callsThe optimized version computes intersection area directly using tuple unpacking and simple arithmetic:
This cuts intersection overhead from ~1.76 seconds to effectively zero as separate operations.
3. Early Exit on Width Check
The optimized code checks
inter_w <= 0before computing height, allowing early termination when boxes don't overlap horizontally. This optimization particularly benefits test cases with non-overlapping or grid-layout objects.Impact Analysis
Based on
function_references,nmsis called in hot paths during table structure refinement:refine_rows(): Called when processing detected table rowsrefine_columns(): Called when processing detected table columnsBoth functions call
nmsas a fallback when token-based refinement isn't applicable, making this optimization critical for table extraction pipelines that process many documents.Test Case Performance
The optimization shows consistent gains across all scenarios:
The speedup scales particularly well for workloads with many objects or high overlap density, which are common in table detection scenarios where multiple overlapping bounding boxes are typical.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-nms-mkotyo17and push.