⚡️ Speed up function slot_into_containers by 631%#35
Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
Open
⚡️ Speed up function slot_into_containers by 631%#35codeflash-ai[bot] wants to merge 1 commit intomainfrom
slot_into_containers by 631%#35codeflash-ai[bot] wants to merge 1 commit intomainfrom
Conversation
The optimized code achieves a **630% speedup** (475ms → 65.1ms) by eliminating expensive object allocations and redundant operations in the hot path of `slot_into_containers`. ## Key Optimizations ### 1. **Eliminated Rect Object Creation** (Major Impact) The original code created 3 `Rect` objects per package-container pair: - One for the package (`Rect(package["bbox"])`) - One for the container (`Rect(container["bbox"])`) - One temporary for intersection calculation (`Rect(package["bbox"])` again) With 171,432 container-package pairs evaluated, this meant **514,296 object allocations**. The line profiler shows `Rect.__init__` taking 745ms and `Rect.intersect()` taking 1.89s—together accounting for ~35% of total runtime. The optimized version uses direct tuple unpacking and arithmetic, avoiding all these allocations. ### 2. **Removed Intermediate List Building and Sorting** The original code built a `match_scores` list for each package containing all container candidates, then sorted it to find the best match. With 1,378 packages and ~124 containers each, this meant: - 171,430 dictionary allocations for match score entries - 1,376 calls to `sort_objects_by_score()` (78.7ms total) The optimized version uses a **single-pass search** to track the best score directly, eliminating both list construction and sorting overhead. ### 3. **Pre-extracted Container Bboxes** By extracting `container["bbox"]` once upfront into `container_bboxes`, the optimization avoids 171,432 dictionary lookups in the inner loop—a simple change that reduces overhead from repeated dict access. ### 4. **Inlined Area Calculations** Instead of calling `get_area()` multiple times (which involves method call overhead), areas are computed inline using direct arithmetic: `(x_max - x_min) * (y_max - y_min)`. ### 5. **Preserved Zero-Area Container Logic** The optimization correctly handles the edge case where containers with zero area should match fully with packages (score 1.0), maintaining behavioral compatibility with the original `Rect.intersect()` logic. ## Test Results Analysis The speedup is most dramatic for: - **Large-scale scenarios** (test_large_scale_many_containers_many_packages: 666% faster) where the N×M loop dominates - **Medium workloads** (test_multiple_packages_and_containers_large_scale: 567% faster) - Even **small inputs** benefit (test_single_container_full_overlap: 147% faster) due to eliminated overhead ## Impact Assessment Based on `function_references`, `slot_into_containers` is called from `nms_by_containment`, which performs non-maxima suppression for table detection. This is likely in a **hot path** for document processing pipelines that extract tables from images. The 6-7x speedup directly translates to faster document processing throughput, especially when processing documents with many table candidates or complex layouts.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 631% (6.31x) speedup for
slot_into_containersinunstructured_inference/models/table_postprocess.py⏱️ Runtime :
475 milliseconds→65.1 milliseconds(best of53runs)📝 Explanation and details
The optimized code achieves a 630% speedup (475ms → 65.1ms) by eliminating expensive object allocations and redundant operations in the hot path of
slot_into_containers.Key Optimizations
1. Eliminated Rect Object Creation (Major Impact)
The original code created 3
Rectobjects per package-container pair:Rect(package["bbox"]))Rect(container["bbox"]))Rect(package["bbox"])again)With 171,432 container-package pairs evaluated, this meant 514,296 object allocations. The line profiler shows
Rect.__init__taking 745ms andRect.intersect()taking 1.89s—together accounting for ~35% of total runtime.The optimized version uses direct tuple unpacking and arithmetic, avoiding all these allocations.
2. Removed Intermediate List Building and Sorting
The original code built a
match_scoreslist for each package containing all container candidates, then sorted it to find the best match. With 1,378 packages and ~124 containers each, this meant:sort_objects_by_score()(78.7ms total)The optimized version uses a single-pass search to track the best score directly, eliminating both list construction and sorting overhead.
3. Pre-extracted Container Bboxes
By extracting
container["bbox"]once upfront intocontainer_bboxes, the optimization avoids 171,432 dictionary lookups in the inner loop—a simple change that reduces overhead from repeated dict access.4. Inlined Area Calculations
Instead of calling
get_area()multiple times (which involves method call overhead), areas are computed inline using direct arithmetic:(x_max - x_min) * (y_max - y_min).5. Preserved Zero-Area Container Logic
The optimization correctly handles the edge case where containers with zero area should match fully with packages (score 1.0), maintaining behavioral compatibility with the original
Rect.intersect()logic.Test Results Analysis
The speedup is most dramatic for:
Impact Assessment
Based on
function_references,slot_into_containersis called fromnms_by_containment, which performs non-maxima suppression for table detection. This is likely in a hot path for document processing pipelines that extract tables from images. The 6-7x speedup directly translates to faster document processing throughput, especially when processing documents with many table candidates or complex layouts.✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-slot_into_containers-mkos4jpdand push.