⚡️ Speed up function `nms_by_containment` by 741% by codeflash-ai[bot] · Pull Request #34 · codeflash-ai/unstructured-inference

codeflash-ai · 2026-01-22T01:28:16Z

📄 741% (7.41x) speedup for `nms_by_containment` in `unstructured_inference/models/table_postprocess.py`

⏱️ Runtime : 232 milliseconds → 27.6 milliseconds (best of 51 runs)

📝 Explanation and details

The optimized code achieves a 741% speedup by eliminating expensive object allocations and algorithmic inefficiencies in two critical functions used for table structure refinement.

Key Optimizations

1. Eliminated Rect Object Overhead in `slot_into_containers` (58.6% → 0% of total time)

The original code created 3 Rect objects per container-package pair (240K+ allocations in large tests). The optimized version:

Pre-extracts container bboxes once: container_bboxes = [container["bbox"] for container in container_objects]
Computes intersections with inline arithmetic instead of Rect.intersect() calls
Calculates areas directly: pkg_w * pkg_h instead of package_rect.get_area()

This alone eliminates the 58.6% bottleneck from the line profiler.

2. Removed O(N log N) Sort Per Package (3.5% → eliminated)

The original code sorted all match scores for every package to find the best match. The optimized version:

Tracks the best container with a simple max search during iteration
No intermediate list of match dictionaries or sort_objects_by_score() calls
Direct comparison: if overlap_fraction > best_score

This removes ~630 sorting operations in typical workloads.

3. Set Construction Optimization in `nms_by_containment`

The original code constructed sets repeatedly in nested loops:

object2_packages = set(packages_by_container[object2_num])  # in outer loop
object1_packages = set(packages_by_container[object1_num])  # in inner loop

The optimized version precomputes all sets once:

package_sets_by_container = [set(pkgs) for pkgs in packages_by_container]

This reduces ~26K set constructions to ~700 in large-scale tests (97% reduction).

4. Early Exit with `break` and `continue`

Added break after finding intersection in inner loop (no need to check remaining containers)
Added continue after marking suppression for empty packages

These micro-optimizations reduce unnecessary iterations in dense scenarios.

Performance Impact by Test Case

The optimization excels in scenarios with:

Many containers × many packages (795% speedup in 100×100 test): Intersection calculation improvements dominate
Dense overlapping containers (367-706% speedup): Set precomputation and early exits shine
One-to-one mappings at scale (763% speedup in 200×200 test): Sorting elimination is critical

Even small workloads see 60-100% speedups due to reduced object creation overhead.

Production Context

Based on function_references, this optimization is in the hot path for table structure detection:

Called by refine_rows() and refine_columns() for every table processed
Operates on detected rows/columns and tokens (typically 50-200 objects per table)
Runs on every document page with tables

The 741% speedup directly translates to faster document processing pipelines, especially for documents with complex table structures or batch processing workloads where this function is called hundreds of times.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 51 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	93.3%

🌀 Click to see Generated Regression Tests

from collections import defaultdict

# imports
import pytest  # used for our unit tests
from unstructured_inference.models.table_postprocess import nms_by_containment

# function to test
# file: unstructured_inference/models/table_postprocess.py
# https://github.com/microsoft/table-transformer/blob/main/src/postprocess.py

class Rect:
    def __init__(self, bbox=None):
        if bbox is None:
            self.x_min = 0
            self.y_min = 0
            self.x_max = 0
            self.y_max = 0
        else:
            self.x_min = bbox[0]
            self.y_min = bbox[1]
            self.x_max = bbox[2]
            self.y_max = bbox[3]

    def get_area(self):
        """Calculates the area of the rectangle"""
        area = (self.x_max - self.x_min) * (self.y_max - self.y_min)
        return area if area > 0 else 0.0

    def intersect(self, other):
        """Calculates the intersection with another rectangle"""
        if self.get_area() == 0:
            self.x_min = other.x_min
            self.y_min = other.y_min
            self.x_max = other.x_max
            self.y_max = other.y_max
        else:
            self.x_min = max(self.x_min, other.x_min)
            self.y_min = max(self.y_min, other.y_min)
            self.x_max = min(self.x_max, other.x_max)
            self.y_max = min(self.y_max, other.y_max)

            if self.x_min > self.x_max or self.y_min > self.y_max or self.get_area() == 0:
                self.x_min = 0
                self.y_min = 0
                self.x_max = 0
                self.y_max = 0

        return self

def slot_into_containers(
    container_objects,
    package_objects,
    overlap_threshold=0.5,
    forced_assignment=False,
):
    """
    Slot a collection of objects into the container they occupy most (the container which holds the
    largest fraction of the object).
    """
    best_match_scores = []

    container_assignments = [[] for container in container_objects]
    package_assignments = [[] for package in package_objects]

    if len(container_objects) == 0 or len(package_objects) == 0:
        return container_assignments, package_assignments, best_match_scores

    match_scores = defaultdict(dict)
    for package_num, package in enumerate(package_objects):
        match_scores = []
        package_rect = Rect(package["bbox"])
        package_area = package_rect.get_area()
        for container_num, container in enumerate(container_objects):
            container_rect = Rect(container["bbox"])
            intersect_area = container_rect.intersect(Rect(package["bbox"])).get_area()

            if package_area > 0:
                overlap_fraction = intersect_area / package_area

                match_scores.append(
                    {
                        "container": container,
                        "container_num": container_num,
                        "score": overlap_fraction,
                    },
                )

        if len(match_scores) > 0:
            sorted_match_scores = sort_objects_by_score(match_scores)

            best_match_score = sorted_match_scores[0]
            best_match_scores.append(best_match_score["score"])
            if forced_assignment or best_match_score["score"] >= overlap_threshold:
                container_assignments[best_match_score["container_num"]].append(package_num)
                package_assignments[package_num].append(best_match_score["container_num"])

    return container_assignments, package_assignments, best_match_scores

def sort_objects_by_score(objects, reverse=True):
    """
    Put any set of objects in order from high score to low score.
    """
    return sorted(objects, key=lambda k: k["score"], reverse=reverse)

def test_basic_non_overlapping_unique_packages():
    # Two non-overlapping containers with unique packages in each -> both should be kept.
    container_A = {"bbox": [0, 0, 10, 10], "score": 0.8}   # higher score
    container_B = {"bbox": [20, 0, 30, 10], "score": 0.5}  # lower score

    # Each package is fully inside exactly one container
    package_0 = {"bbox": [1, 1, 2, 2]}   # inside A
    package_1 = {"bbox": [21, 1, 22, 2]} # inside B

    containers = [container_B, container_A]  # intentionally out-of-score-order
    packages = [package_0, package_1]

    # Run nms_by_containment; sorting by score should ensure A is first then B and both are kept
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); results = codeflash_output # 37.1μs -> 17.9μs (107% faster)

def test_identical_containers_with_shared_package_results_in_lower_being_suppressed():
    # Two containers with identical bboxes. One package inside that bbox.
    # The package will be assigned to the highest-scored container (tie-breaker by order),
    # leaving the other container with zero packages -> should be suppressed (since it's not the first).
    container_high = {"bbox": [0, 0, 10, 10], "score": 0.9}
    container_low = {"bbox": [0, 0, 10, 10], "score": 0.5}
    package = {"bbox": [2, 2, 3, 3]}

    containers = [container_low, container_high]  # unsorted input
    packages = [package]

    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); final = codeflash_output # 27.1μs -> 15.1μs (79.0% faster)

def test_empty_inputs_return_expected_results():
    # No containers -> result must be empty list quickly
    codeflash_output = nms_by_containment([], [], overlap_threshold=0.5) # 7.04μs -> 7.24μs (2.76% slower)

    # Some containers but no packages:
    # The implementation suppresses any container with no packages except the first one (highest scored).
    c1 = {"bbox": [0, 0, 10, 10], "score": 0.9}
    c2 = {"bbox": [10, 10, 20, 20], "score": 0.8}
    containers = [c1, c2]
    packages = []  # no packages -> only the highest-scored container should remain
    codeflash_output = nms_by_containment(containers, packages); result = codeflash_output # 8.15μs -> 7.02μs (16.1% faster)

def test_package_with_zero_area_is_ignored_causing_suppression_of_later_containers():
    # A package with zero area should not be assigned to any container (package_area == 0), which should
    # cause all containers except the first to be suppressed (they have no packages).
    c1 = {"bbox": [0, 0, 10, 10], "score": 0.7}
    c2 = {"bbox": [20, 0, 30, 10], "score": 0.6}
    zero_area_package = {"bbox": [5, 5, 5, 5]}  # zero width and height -> area 0

    codeflash_output = nms_by_containment([c2, c1], [zero_area_package]); result = codeflash_output # 24.7μs -> 11.8μs (109% faster)

def test_container_with_zero_area_can_be_assigned_packages():
    # A container with zero area will have its Rect.get_area==0 and thus intersect() will set it to the package bbox,
    # which results in an intersection_area equal to the package area and therefore packages can be assigned to it.
    # Confirm that a zero-area container receives package assignments and is therefore not suppressed.
    # container_zero has zero area bbox but will be treated by intersect() as adopting package bbox
    container_zero = {"bbox": [0, 0, 0, 0], "score": 0.85}  # zero-area container
    container_other = {"bbox": [100, 100, 110, 110], "score": 0.1}
    package = {"bbox": [1, 1, 2, 2]}

    # Put container_zero first to confirm it stays and gets the package
    containers = [container_zero, container_other]
    packages = [package]

    # The zero-area container should get the package assignment (due to intersect logic),
    # and since it is first and has packages, both checks should keep it. container_other has no packages so gets suppressed.
    codeflash_output = nms_by_containment(containers, packages); final = codeflash_output # 26.8μs -> 13.0μs (106% faster)

def test_large_scale_many_one_to_one_mappings():
    # Large-scale test with many containers and many packages (one-to-one, each package inside distinct container).
    # Keep within limits: create 200 containers and 200 packages.
    n = 200
    containers = []
    packages = []
    # Create n containers each covering a disjoint x-range and put one package inside each.
    for i in range(n):
        left = i * 10
        right = left + 9
        containers.append({"bbox": [left, 0, right, 9], "score": float(n - i)})  # decreasing scores so container[0] highest
        # place a package well inside the container so assignment is unambiguous
        packages.append({"bbox": [left + 1, 1, left + 2, 2]})

    # Shuffle input order to ensure sorting occurs; keep deterministic by reversing list
    containers_input = list(reversed(containers))
    packages_input = packages[:]  # keep order

    codeflash_output = nms_by_containment(containers_input, packages_input, overlap_threshold=0.5); final = codeflash_output # 119ms -> 13.9ms (763% faster)

def test_large_scale_many_in_single_container_results_in_single_remaining_container():
    # Large-scale scenario where many packages are all inside a single (highest-scored) container.
    # All other containers should be suppressed because they get no packages.
    n_containers = 150
    n_packages = 150
    containers = []
    packages = []

    # container 0 (highest score) will cover a large area and contain all packages
    containers.append({"bbox": [0, 0, 1000, 1000], "score": 100.0})
    # create many other containers that do not overlap the package region
    for i in range(1, n_containers):
        left = 2000 + i * 10
        containers.append({"bbox": [left, 0, left + 9, 9], "score": float(n_containers - i)})

    # create many packages all inside the first container
    for j in range(n_packages):
        packages.append({"bbox": [1 + j * 0.1, 1, 1 + j * 0.1 + 0.05, 1.05]})

    # Run NMS - expected that only the first (highest-scored) container remains
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); final = codeflash_output # 65.7ms -> 7.78ms (744% faster)

def test_slot_into_containers_returns_empty_assignments_on_empty_inputs():
    # Directly assert slot_into_containers behavior for empty inputs
    empty_containers = []
    empty_packages = []
    assignments = slot_into_containers(empty_containers, empty_packages)
    c_assigns, p_assigns, best_scores = assignments
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest
from unstructured_inference.models.table_postprocess import (
    Rect, nms_by_containment, slot_into_containers, sort_objects_by_score)

def test_nms_by_containment_single_container_no_packages():
    """Test NMS with a single container and no packages - container should be retained."""
    containers = [{"bbox": [0, 0, 100, 100], "score": 0.9}]
    packages = []
    codeflash_output = nms_by_containment(containers, packages); result = codeflash_output # 9.22μs -> 8.89μs (3.69% faster)

def test_nms_by_containment_no_containers():
    """Test NMS with no containers - should return empty list."""
    containers = []
    packages = [{"bbox": [10, 10, 50, 50], "score": 0.8}]
    codeflash_output = nms_by_containment(containers, packages); result = codeflash_output # 7.36μs -> 7.29μs (1.00% faster)

def test_nms_by_containment_single_container_single_package():
    """Test NMS with one container and one package that overlaps significantly."""
    containers = [{"bbox": [0, 0, 100, 100], "score": 0.9}]
    packages = [{"bbox": [20, 20, 80, 80], "score": 0.7}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 21.2μs -> 12.8μs (65.5% faster)

def test_nms_by_containment_two_containers_shared_package():
    """Test NMS with two containers sharing a package - higher score container retained."""
    containers = [
        {"bbox": [0, 0, 100, 100], "score": 0.95},
        {"bbox": [10, 10, 90, 90], "score": 0.85},
    ]
    packages = [{"bbox": [30, 30, 70, 70], "score": 0.8}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 28.3μs -> 15.6μs (81.6% faster)

def test_nms_by_containment_two_containers_different_packages():
    """Test NMS with two containers with different packages - both should be retained."""
    containers = [
        {"bbox": [0, 0, 50, 100], "score": 0.9},
        {"bbox": [50, 0, 100, 100], "score": 0.85},
    ]
    packages = [
        {"bbox": [10, 20, 40, 80], "score": 0.7},
        {"bbox": [60, 20, 90, 80], "score": 0.7},
    ]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 38.5μs -> 18.6μs (108% faster)

def test_nms_by_containment_container_no_packages():
    """Test NMS with container that has no assigned packages - should be suppressed."""
    containers = [
        {"bbox": [0, 0, 100, 100], "score": 0.9},
        {"bbox": [200, 200, 300, 300], "score": 0.8},
    ]
    packages = [{"bbox": [20, 20, 80, 80], "score": 0.7}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 28.4μs -> 15.1μs (88.1% faster)

def test_nms_by_containment_low_overlap_threshold():
    """Test NMS with very low overlap threshold - more packages get assigned."""
    containers = [
        {"bbox": [0, 0, 100, 100], "score": 0.9},
        {"bbox": [50, 50, 150, 150], "score": 0.8},
    ]
    packages = [{"bbox": [40, 40, 60, 60], "score": 0.7}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.01); result = codeflash_output # 27.8μs -> 15.0μs (85.1% faster)

def test_nms_by_containment_high_overlap_threshold():
    """Test NMS with very high overlap threshold - packages may not be assigned."""
    containers = [{"bbox": [0, 0, 100, 100], "score": 0.9}]
    packages = [{"bbox": [70, 70, 90, 90], "score": 0.7}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.99); result = codeflash_output # 20.4μs -> 12.3μs (65.5% faster)

def test_sort_objects_by_score_ascending():
    """Test sorting objects by score in ascending order."""
    objects = [
        {"score": 0.5},
        {"score": 0.9},
        {"score": 0.7},
    ]
    result = sort_objects_by_score(objects, reverse=False)

def test_sort_objects_by_score_descending():
    """Test sorting objects by score in descending order (default)."""
    objects = [
        {"score": 0.5},
        {"score": 0.9},
        {"score": 0.7},
    ]
    result = sort_objects_by_score(objects)

def test_rect_basic_area_calculation():
    """Test basic rectangle area calculation."""
    rect = Rect([0, 0, 10, 10])

def test_rect_zero_area():
    """Test rectangle with zero area."""
    rect = Rect([5, 5, 5, 5])

def test_rect_intersection_complete_overlap():
    """Test rectangle intersection with complete overlap."""
    rect1 = Rect([0, 0, 10, 10])
    rect2 = Rect([0, 0, 10, 10])
    rect1.intersect(rect2)

def test_rect_intersection_partial_overlap():
    """Test rectangle intersection with partial overlap."""
    rect1 = Rect([0, 0, 10, 10])
    rect2 = Rect([5, 5, 15, 15])
    rect1.intersect(rect2)

def test_rect_intersection_no_overlap():
    """Test rectangle intersection with no overlap."""
    rect1 = Rect([0, 0, 10, 10])
    rect2 = Rect([20, 20, 30, 30])
    rect1.intersect(rect2)

def test_nms_by_containment_identical_containers():
    """Test NMS with identical containers - lower score should be suppressed."""
    containers = [
        {"bbox": [0, 0, 100, 100], "score": 0.9},
        {"bbox": [0, 0, 100, 100], "score": 0.8},
    ]
    packages = [{"bbox": [20, 20, 80, 80], "score": 0.7}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 28.1μs -> 16.0μs (75.6% faster)

def test_nms_by_containment_nested_containers():
    """Test NMS with containers nested inside each other."""
    containers = [
        {"bbox": [0, 0, 200, 200], "score": 0.9},
        {"bbox": [20, 20, 180, 180], "score": 0.85},
        {"bbox": [40, 40, 160, 160], "score": 0.8},
    ]
    packages = [{"bbox": [50, 50, 150, 150], "score": 0.7}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 32.8μs -> 16.9μs (94.7% faster)

def test_nms_by_containment_multiple_packages_one_container():
    """Test NMS with multiple packages in one container."""
    containers = [{"bbox": [0, 0, 100, 100], "score": 0.9}]
    packages = [
        {"bbox": [10, 10, 30, 30], "score": 0.8},
        {"bbox": [40, 40, 60, 60], "score": 0.7},
        {"bbox": [70, 70, 90, 90], "score": 0.6},
    ]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 32.6μs -> 16.3μs (100% faster)

def test_nms_by_containment_package_touches_container_edge():
    """Test package that touches but barely overlaps container edge."""
    containers = [
        {"bbox": [0, 0, 100, 100], "score": 0.9},
        {"bbox": [100, 0, 200, 100], "score": 0.8},
    ]
    packages = [{"bbox": [95, 40, 105, 60], "score": 0.7}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 27.5μs -> 15.4μs (78.7% faster)

def test_nms_by_containment_very_small_package():
    """Test with very small package in large container."""
    containers = [{"bbox": [0, 0, 1000, 1000], "score": 0.9}]
    packages = [{"bbox": [500, 500, 501, 501], "score": 0.8}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 20.7μs -> 12.5μs (66.0% faster)

def test_nms_by_containment_package_larger_than_container():
    """Test package that is larger than container."""
    containers = [{"bbox": [40, 40, 60, 60], "score": 0.9}]
    packages = [{"bbox": [0, 0, 100, 100], "score": 0.8}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 19.8μs -> 12.1μs (63.8% faster)

def test_nms_by_containment_zero_area_container():
    """Test with zero-area container (line or point)."""
    containers = [{"bbox": [50, 50, 50, 50], "score": 0.9}]
    packages = [{"bbox": [40, 40, 60, 60], "score": 0.8}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 18.8μs -> 11.2μs (68.1% faster)

def test_nms_by_containment_zero_area_package():
    """Test with zero-area package (line or point)."""
    containers = [{"bbox": [0, 0, 100, 100], "score": 0.9}]
    packages = [{"bbox": [50, 50, 50, 50], "score": 0.8}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 18.5μs -> 10.7μs (73.0% faster)

def test_nms_by_containment_negative_coordinates():
    """Test with negative coordinates."""
    containers = [
        {"bbox": [-100, -100, 0, 0], "score": 0.9},
        {"bbox": [-50, -50, 50, 50], "score": 0.8},
    ]
    packages = [{"bbox": [-30, -30, -10, -10], "score": 0.7}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 27.9μs -> 15.5μs (79.6% faster)

def test_nms_by_containment_threshold_exactly_at_boundary():
    """Test with overlap threshold exactly at calculated overlap fraction."""
    containers = [{"bbox": [0, 0, 100, 100], "score": 0.9}]
    packages = [{"bbox": [0, 0, 50, 50], "score": 0.8}]
    # Package area is 2500, overlap area is 2500, so fraction is 1.0
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=1.0); result = codeflash_output # 20.5μs -> 12.6μs (62.7% faster)

def test_nms_by_containment_all_packages_rejected_by_threshold():
    """Test where all packages fall below overlap threshold."""
    containers = [
        {"bbox": [0, 0, 100, 100], "score": 0.9},
        {"bbox": [200, 200, 300, 300], "score": 0.8},
    ]
    packages = [
        {"bbox": [10, 10, 20, 20], "score": 0.7},
        {"bbox": [210, 210, 220, 220], "score": 0.6},
    ]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.99); result = codeflash_output # 38.1μs -> 18.7μs (104% faster)

def test_nms_by_containment_many_containers_few_packages():
    """Test with many containers but few packages."""
    containers = [
        {"bbox": [i*50, i*50, i*50+40, i*50+40], "score": 1.0 - i*0.01}
        for i in range(10)
    ]
    packages = [{"bbox": [100, 100, 110, 110], "score": 0.8}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 64.6μs -> 22.6μs (187% faster)

def test_nms_by_containment_equal_scores():
    """Test containers with equal scores."""
    containers = [
        {"bbox": [0, 0, 100, 100], "score": 0.9},
        {"bbox": [50, 50, 150, 150], "score": 0.9},
    ]
    packages = [{"bbox": [60, 60, 90, 90], "score": 0.8}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 27.6μs -> 15.0μs (83.9% faster)

def test_rect_intersection_with_zero_area_rect():
    """Test intersection with zero-area rectangle."""
    rect1 = Rect([0, 0, 10, 10])
    rect2 = Rect([5, 5, 5, 5])
    result = rect1.intersect(rect2)

def test_slot_into_containers_no_containers():
    """Test slot_into_containers with no containers."""
    containers = []
    packages = [{"bbox": [10, 10, 50, 50], "score": 0.8}]
    container_assign, package_assign, scores = slot_into_containers(containers, packages)

def test_slot_into_containers_no_packages():
    """Test slot_into_containers with no packages."""
    containers = [{"bbox": [0, 0, 100, 100], "score": 0.9}]
    packages = []
    container_assign, package_assign, scores = slot_into_containers(containers, packages)

def test_sort_objects_by_score_single_object():
    """Test sorting with single object."""
    objects = [{"score": 0.5}]
    result = sort_objects_by_score(objects)

def test_sort_objects_by_score_equal_scores():
    """Test sorting objects with equal scores."""
    objects = [
        {"score": 0.5, "id": 1},
        {"score": 0.5, "id": 2},
        {"score": 0.5, "id": 3},
    ]
    result = sort_objects_by_score(objects)

def test_nms_by_containment_many_containers_many_packages():
    """Test NMS with 100 containers and 100 packages."""
    containers = [
        {
            "bbox": [i % 10 * 50, i // 10 * 50, i % 10 * 50 + 40, i // 10 * 50 + 40],
            "score": 1.0 - i * 0.001,
        }
        for i in range(100)
    ]
    packages = [
        {
            "bbox": [i % 10 * 55 + 5, i // 10 * 55 + 5, i % 10 * 55 + 25, i // 10 * 55 + 25],
            "score": 0.8 - i * 0.001,
        }
        for i in range(100)
    ]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 29.7ms -> 3.31ms (795% faster)

def test_nms_by_containment_wide_spatial_distribution():
    """Test NMS with containers spread across wide spatial range."""
    containers = [
        {
            "bbox": [i * 200, i * 200, i * 200 + 100, i * 200 + 100],
            "score": 0.9 - i * 0.001,
        }
        for i in range(50)
    ]
    packages = [
        {
            "bbox": [i * 200 + 20, i * 200 + 20, i * 200 + 80, i * 200 + 80],
            "score": 0.7,
        }
        for i in range(50)
    ]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 7.59ms -> 1.12ms (578% faster)

def test_nms_by_containment_dense_overlapping():
    """Test NMS with many densely overlapping containers."""
    containers = [
        {
            "bbox": [i, i, 100 + i, 100 + i],
            "score": 0.9 - i * 0.001,
        }
        for i in range(50)
    ]
    packages = [{"bbox": [40, 40, 60, 60], "score": 0.8}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 255μs -> 54.7μs (367% faster)

def test_nms_by_containment_high_threshold_filter():
    """Test NMS with high overlap threshold filtering many containers."""
    containers = [
        {
            "bbox": [0, 0, 100 + i, 100 + i],
            "score": 0.9 - i * 0.001,
        }
        for i in range(50)
    ]
    packages = [{"bbox": [10, 10, 20, 20], "score": 0.8}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.9); result = codeflash_output # 250μs -> 51.1μs (390% faster)

def test_nms_by_containment_low_threshold_many_assignments():
    """Test NMS with low threshold allowing many package assignments."""
    containers = [
        {
            "bbox": [i * 30, 0, i * 30 + 50, 50],
            "score": 0.9 - i * 0.001,
        }
        for i in range(50)
    ]
    packages = [
        {
            "bbox": [i * 30 + 10, 10, i * 30 + 40, 40],
            "score": 0.7,
        }
        for i in range(50)
    ]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.1); result = codeflash_output # 7.59ms -> 941μs (706% faster)

def test_sort_objects_by_score_large_list():
    """Test sorting large list of objects."""
    objects = [
        {"score": 1.0 - i * 0.001}
        for i in range(500)
    ]
    result = sort_objects_by_score(objects)
    # Verify descending order
    for i in range(len(result) - 1):
        pass

def test_slot_into_containers_large_scale():
    """Test slot_into_containers with many objects."""
    containers = [
        {
            "bbox": [i * 100, i * 100, i * 100 + 80, i * 100 + 80],
            "score": 0.9 - i * 0.001,
        }
        for i in range(50)
    ]
    packages = [
        {
            "bbox": [i * 100 + 10, i * 100 + 10, i * 100 + 70, i * 100 + 70],
            "score": 0.7,
        }
        for i in range(50)
    ]
    container_assign, package_assign, scores = slot_into_containers(
        containers, packages, overlap_threshold=0.5
    )

def test_rect_operations_large_number():
    """Test rectangle operations with large coordinate values."""
    rect1 = Rect([0, 0, 1000000, 1000000])
    rect2 = Rect([100000, 100000, 900000, 900000])
    rect1.intersect(rect2)
    # Verify intersection is correct
    expected_area = (900000 - 100000) ** 2

def test_nms_by_containment_varied_container_sizes():
    """Test NMS with containers of vastly different sizes."""
    containers = [
        {"bbox": [0, 0, 10 ** i, 10 ** i], "score": 0.9}
        for i in range(1, 5)
    ] + [
        {"bbox": [100, 100, 110, 110], "score": 0.8}
    ]
    packages = [{"bbox": [105, 105, 108, 108], "score": 0.7}]
    codeflash_output = nms_by_containment(containers, packages, overlap_threshold=0.5); result = codeflash_output # 42.6μs -> 19.2μs (121% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-nms_by_containment-mkorw2k4 and push.

The optimized code achieves a **741% speedup** by eliminating expensive object allocations and algorithmic inefficiencies in two critical functions used for table structure refinement. ## Key Optimizations ### 1. **Eliminated Rect Object Overhead in `slot_into_containers`** (58.6% → 0% of total time) The original code created 3 `Rect` objects per container-package pair (240K+ allocations in large tests). The optimized version: - Pre-extracts container bboxes once: `container_bboxes = [container["bbox"] for container in container_objects]` - Computes intersections with inline arithmetic instead of `Rect.intersect()` calls - Calculates areas directly: `pkg_w * pkg_h` instead of `package_rect.get_area()` This alone eliminates the 58.6% bottleneck from the line profiler. ### 2. **Removed O(N log N) Sort Per Package** (3.5% → eliminated) The original code sorted all match scores for every package to find the best match. The optimized version: - Tracks the best container with a simple max search during iteration - No intermediate list of match dictionaries or `sort_objects_by_score()` calls - Direct comparison: `if overlap_fraction > best_score` This removes ~630 sorting operations in typical workloads. ### 3. **Set Construction Optimization in `nms_by_containment`** The original code constructed sets repeatedly in nested loops: ```python object2_packages = set(packages_by_container[object2_num]) # in outer loop object1_packages = set(packages_by_container[object1_num]) # in inner loop ``` The optimized version precomputes all sets once: ```python package_sets_by_container = [set(pkgs) for pkgs in packages_by_container] ``` This reduces ~26K set constructions to ~700 in large-scale tests (97% reduction). ### 4. **Early Exit with `break` and `continue`** - Added `break` after finding intersection in inner loop (no need to check remaining containers) - Added `continue` after marking suppression for empty packages These micro-optimizations reduce unnecessary iterations in dense scenarios. ## Performance Impact by Test Case The optimization excels in scenarios with: - **Many containers × many packages** (795% speedup in 100×100 test): Intersection calculation improvements dominate - **Dense overlapping containers** (367-706% speedup): Set precomputation and early exits shine - **One-to-one mappings at scale** (763% speedup in 200×200 test): Sorting elimination is critical Even small workloads see 60-100% speedups due to reduced object creation overhead. ## Production Context Based on `function_references`, this optimization is in the **hot path** for table structure detection: - Called by `refine_rows()` and `refine_columns()` for every table processed - Operates on detected rows/columns and tokens (typically 50-200 objects per table) - Runs on every document page with tables The 741% speedup directly translates to faster document processing pipelines, especially for documents with complex table structures or batch processing workloads where this function is called hundreds of times.

codeflash-ai bot requested a review from aseembits93 January 22, 2026 01:28

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `nms_by_containment` by 741%#34

⚡️ Speed up function `nms_by_containment` by 741%#34
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-nms_by_containment-mkorw2k4

codeflash-ai bot commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

codeflash-ai bot commented Jan 22, 2026

📄 741% (7.41x) speedup for nms_by_containment in unstructured_inference/models/table_postprocess.py

📝 Explanation and details

Key Optimizations

1. Eliminated Rect Object Overhead in slot_into_containers (58.6% → 0% of total time)

2. Removed O(N log N) Sort Per Package (3.5% → eliminated)

3. Set Construction Optimization in nms_by_containment

4. Early Exit with break and continue

Performance Impact by Test Case

Production Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

📄 741% (7.41x) speedup for `nms_by_containment` in `unstructured_inference/models/table_postprocess.py`

1. Eliminated Rect Object Overhead in `slot_into_containers` (58.6% → 0% of total time)

3. Set Construction Optimization in `nms_by_containment`

4. Early Exit with `break` and `continue`