⚡️ Speed up function align_supercells by 177%#44
Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
Open
⚡️ Speed up function align_supercells by 177%#44codeflash-ai[bot] wants to merge 1 commit intomainfrom
align_supercells by 177%#44codeflash-ai[bot] wants to merge 1 commit intomainfrom
Conversation
The optimized code achieves a **177% speedup** (7.48ms → 2.70ms) by eliminating expensive object allocations and method calls in hot loops through two key strategies: ## What Was Optimized 1. **Precomputed Row/Column Metrics**: The original code repeatedly accessed `row["bbox"]` and `col["bbox"]` dictionaries inside nested loops (3,309 row iterations × 3,117 column iterations per supercell). The optimized version extracts these once into tuples `(ymin, ymax, height, row_dict)` and `(xmin, xmax, width, col_dict)`, avoiding ~10,000+ dictionary lookups per function call. 2. **Eliminated Rect Object Creation**: The original code created 1,263 `Rect` objects and called their methods (955 `include_rect` calls, 106 `intersect` calls). The optimized version replaces this with direct numeric aggregation using simple min/max comparisons on primitive values, removing all object allocation overhead. 3. **Replaced Conditional Expressions with Ternary Operations**: Changed `max(a, b)` and `min(a, b)` calls to inline ternary expressions like `a if a > b else b`, which are faster in Python's bytecode execution. ## Why This is Faster - **Memory Allocation**: Creating objects in Python involves significant overhead (memory allocation, initialization, method dispatch). Eliminating 1,263 Rect allocations removes this entirely. - **Method Call Overhead**: Each `include_rect()` call involved multiple attribute accesses and comparisons. Direct numeric operations on local variables are much faster. - **Cache Locality**: Tuples of primitives are more cache-friendly than dictionary lookups and object attributes. - **Reduced Bytecode**: Ternary expressions generate fewer bytecode operations than function calls. ## Performance by Test Case The optimization excels when: - **Many supercells with large row/column grids**: The `test_large_scale_multiple_supercells_efficiency_and_correctness` shows **86.4% speedup** (112μs → 60.5μs) because precomputation amortizes across all iterations - **Supercells spanning multiple rows/columns**: Tests like `test_single_supercell_single_row_single_column` (41.7% faster) and `test_supercell_spanning_two_rows` (38.8% faster) benefit from eliminated Rect operations in the aggregation phase The optimization is slightly slower (~6-13%) on trivial cases with no intersecting rows/columns (early exits), where the upfront precomputation cost exceeds savings. However, **real workloads** typically involve substantial overlap detection, making this trade-off worthwhile. ## Impact on Workloads This function processes table structure extraction results. If called frequently during document parsing (e.g., processing multiple tables per document), the 2.77× speedup directly reduces end-to-end latency. The optimization is most beneficial when: - Documents contain complex tables with many rows/columns - Multiple supercells need alignment per table - High-throughput batch processing is required
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 177% (1.77x) speedup for
align_supercellsinunstructured_inference/models/table_postprocess.py⏱️ Runtime :
7.48 milliseconds→2.70 milliseconds(best of88runs)📝 Explanation and details
The optimized code achieves a 177% speedup (7.48ms → 2.70ms) by eliminating expensive object allocations and method calls in hot loops through two key strategies:
What Was Optimized
Precomputed Row/Column Metrics: The original code repeatedly accessed
row["bbox"]andcol["bbox"]dictionaries inside nested loops (3,309 row iterations × 3,117 column iterations per supercell). The optimized version extracts these once into tuples(ymin, ymax, height, row_dict)and(xmin, xmax, width, col_dict), avoiding ~10,000+ dictionary lookups per function call.Eliminated Rect Object Creation: The original code created 1,263
Rectobjects and called their methods (955include_rectcalls, 106intersectcalls). The optimized version replaces this with direct numeric aggregation using simple min/max comparisons on primitive values, removing all object allocation overhead.Replaced Conditional Expressions with Ternary Operations: Changed
max(a, b)andmin(a, b)calls to inline ternary expressions likea if a > b else b, which are faster in Python's bytecode execution.Why This is Faster
include_rect()call involved multiple attribute accesses and comparisons. Direct numeric operations on local variables are much faster.Performance by Test Case
The optimization excels when:
test_large_scale_multiple_supercells_efficiency_and_correctnessshows 86.4% speedup (112μs → 60.5μs) because precomputation amortizes across all iterationstest_single_supercell_single_row_single_column(41.7% faster) andtest_supercell_spanning_two_rows(38.8% faster) benefit from eliminated Rect operations in the aggregation phaseThe optimization is slightly slower (~6-13%) on trivial cases with no intersecting rows/columns (early exits), where the upfront precomputation cost exceeds savings. However, real workloads typically involve substantial overlap detection, making this trade-off worthwhile.
Impact on Workloads
This function processes table structure extraction results. If called frequently during document parsing (e.g., processing multiple tables per document), the 2.77× speedup directly reduces end-to-end latency. The optimization is most beneficial when:
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-align_supercells-mkou96v9and push.