Skip to content

⚡️ Speed up function safe_division by 17%#47

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-safe_division-mkouw6ho
Open

⚡️ Speed up function safe_division by 17%#47
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-safe_division-mkouw6ho

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Jan 22, 2026

📄 17% (0.17x) speedup for safe_division in unstructured_inference/math.py

⏱️ Runtime : 469 microseconds 401 microseconds (best of 138 runs)

📝 Explanation and details

The optimization replaces max(b, FLOAT_EPSILON) with a conditional expression a / FLOAT_EPSILON if b <= FLOAT_EPSILON else a / b. This provides a 17% speedup by eliminating the overhead of Python's max() built-in function call.

Key Performance Improvements:

  1. Eliminates Function Call Overhead: The max() function involves Python's function call machinery (argument unpacking, dispatch), which is costly for such a simple operation. The conditional expression evaluates directly without this overhead.

  2. Branch Prediction Benefits: The if-else construct allows the CPU's branch predictor to optimize the common case. Looking at the test results, when b is a normal value (>> FLOAT_EPSILON), the else branch is taken and executes efficiently. Test cases with normal denominators show 10-15% speedups, while edge cases with tiny denominators show even better improvements (25-38%).

  3. Micro-optimization Impact: Per-hit time improved from 919.2ns to 811.1ns (~12% per call), which compounds significantly when called repeatedly.

Why This Matters Based on Function References:

The function is used in hot paths for geometric computations in unstructured_inference/inference/elements.py:

  • intersection_over_union() - Called for comparing rectangle similarity
  • intersection_over_minimum() - Used for subset detection
  • is_almost_subregion_of() - Performs subregion checks

These operations are likely executed in tight loops during document layout analysis, where comparing many bounding boxes is common. The 17% speedup means faster document processing pipelines.

Test Case Performance Patterns:

  • Normal cases (denominator >> FLOAT_EPSILON): 10-16% faster - the common path benefits from avoiding max()
  • Edge cases (denominator ≤ FLOAT_EPSILON): 20-38% faster - these benefit from short-circuit evaluation taking the first branch immediately
  • Batch operations: 20-31% faster when processing multiple rectangles, compounding the per-call savings

The optimization maintains identical behavior while delivering consistent performance gains across all workload types.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 4 Passed
🌀 Generated Regression Tests 815 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_math.py::test_safe_division 11.9μs 10.5μs 13.2%✅
🌀 Click to see Generated Regression Tests
import math  # used for isnan/isinf checks
import random  # used to generate deterministic pseudo-random test cases

import numba as nb
# function to test
# (Preserve the original function exactly as provided, including imports and decorator)
import numpy as np
# imports
import pytest  # used for our unit tests
from unstructured_inference.math import safe_division

FLOAT_EPSILON = np.finfo(np.float64).eps

# unit tests

# Helper: pure-Python reference implementation that mirrors the original logic.
# This is used to compute expected values for a variety of inputs.
def _expected_safe_division(a, b):
    # mirror the same steps: cast to float, compare to FLOAT_EPSILON, then divide
    da = float(a)
    db = float(b)
    if db >= FLOAT_EPSILON:
        denom = db
    else:
        denom = FLOAT_EPSILON
    return da / denom

def test_basic_integer_division():
    # Basic case: simple integers where denominator is well above epsilon
    codeflash_output = safe_division(10, 2); result = codeflash_output # 2.39μs -> 2.09μs (14.2% faster)

def test_basic_negative_and_float_division():
    # Negative numerator and float denominator
    codeflash_output = safe_division(-9, 3) # 2.32μs -> 2.12μs (9.64% faster)
    codeflash_output = safe_division(1.5, 0.5) # 675ns -> 635ns (6.30% faster)

def test_zero_numerator_and_zero_denominator():
    # Edge: both numerator and denominator zero -> numerator 0.0 divided by FLOAT_EPSILON => 0.0
    codeflash_output = safe_division(0, 0); result = codeflash_output # 3.51μs -> 3.15μs (11.4% faster)

def test_division_by_zero_uses_epsilon():
    # When denominator is exactly zero, the function must use FLOAT_EPSILON,
    # therefore the result should be a / FLOAT_EPSILON.
    a = 5.0
    codeflash_output = safe_division(a, 0.0); result = codeflash_output # 3.05μs -> 2.71μs (12.5% faster)
    expected = a / FLOAT_EPSILON

def test_below_epsilon_and_negative_denominators():
    # If denominator is positive but smaller than EPS, behavior should use EPS, not the tiny b.
    a = 2.0
    tiny_b = FLOAT_EPSILON / 10.0  # smaller than EPS
    codeflash_output = safe_division(a, tiny_b); res_tiny = codeflash_output # 1.82μs -> 1.45μs (25.7% faster)
    expected_tiny = a / FLOAT_EPSILON  # should use EPS

    # If denominator is negative, db >= FLOAT_EPSILON is False -> use EPS (positive),
    # therefore the result should be positive a / EPS (not a negative result).
    neg_b = -1.0
    codeflash_output = safe_division(a, neg_b); res_neg = codeflash_output # 688ns -> 692ns (0.578% slower)
    expected_neg = a / FLOAT_EPSILON

def test_b_equals_epsilon_boundary():
    # When b == FLOAT_EPSILON, the branch condition db >= FLOAT_EPSILON should be True,
    # therefore denominator should be exactly b (FLOAT_EPSILON) and result equal to a / EPS.
    a = 3.0
    b = FLOAT_EPSILON
    codeflash_output = safe_division(a, b); res = codeflash_output # 2.95μs -> 2.40μs (22.9% faster)
    expected = a / FLOAT_EPSILON

def test_infinite_and_nan_denominator_behavior():
    # If denominator is +inf, then a/inf -> 0.0
    codeflash_output = safe_division(1.0, float('inf')); res_inf = codeflash_output # 2.13μs -> 1.80μs (18.2% faster)

    # If denominator is NaN, comparisons with NaN are False -> branch should pick FLOAT_EPSILON
    # so result should be a / FLOAT_EPSILON
    a = 1.0
    codeflash_output = safe_division(a, float('nan')); res_nan = codeflash_output # 563ns -> 462ns (21.9% faster)
    expected_nan = a / FLOAT_EPSILON

def test_large_values_and_overflow_behavior():
    # Very large numerator with tiny denominator below EPS may overflow to inf;
    # ensure function returns the same as the reference logic.
    a = 1e308
    tiny_b = FLOAT_EPSILON / 1e6  # definitely smaller than EPS
    codeflash_output = safe_division(a, tiny_b); res = codeflash_output # 16.7μs -> 17.5μs (4.80% slower)
    expected = _expected_safe_division(a, tiny_b)
    # If expected is infinite, check for infinity; otherwise compare numerically.
    if math.isinf(expected):
        pass
    else:
        pass

def test_many_random_samples_against_reference():
    # Large scale test: deterministic pseudo-random inputs (kept <= 1000 iterations).
    # We include a variety of values: large, small, negative, near-zero, normal ranges.
    rnd = random.Random(0)  # deterministic seed for reproducibility
    samples = 500  # within provided constraint (< 1000)
    for _ in range(samples):
        # choose number types from a curated set to increase chance of edge values:
        choice = rnd.randint(0, 6)
        if choice == 0:
            a = rnd.uniform(-1e5, 1e5)
            b = rnd.uniform(-1e5, 1e5)
        elif choice == 1:
            # very small positive denominators
            a = rnd.uniform(-1e3, 1e3)
            b = FLOAT_EPSILON * rnd.uniform(0.0, 0.9)  # often below EPS
        elif choice == 2:
            # tiny numerators and normal denominators
            a = rnd.uniform(-1e-12, 1e-12)
            b = rnd.uniform(1e-6, 1e3)
        elif choice == 3:
            # include exact EPS, zero, and near-zero negatives
            a = rnd.uniform(-10.0, 10.0)
            b = rnd.choice([0.0, FLOAT_EPSILON, -FLOAT_EPSILON/2.0])
        elif choice == 4:
            # large magnitudes
            a = rnd.uniform(-1e308, 1e308)
            b = rnd.uniform(1e-2, 1e2)
        elif choice == 5:
            # infinite and nan cases occasionally
            a = rnd.uniform(-100.0, 100.0)
            b = rnd.choice([float('inf'), -float('inf'), float('nan')])
        else:
            # simple random small ints
            a = rnd.randint(-100, 100)
            b = rnd.randint(-10, 10)

        # compute both results
        codeflash_output = safe_division(a, b); result = codeflash_output # 227μs -> 190μs (19.6% faster)
        expected = _expected_safe_division(a, b)

        # If expected is NaN, the function's logic should not return NaN (it forces EPS),
        # so expected will not be NaN under reference implementation; but account for inf.
        if math.isnan(expected):
            # Our reference implementation should never produce NaN because it replaces
            # small/NaN/negative denominators with FLOAT_EPSILON, thus this branch is for safety.
            pytest.skip("Reference produced NaN unexpectedly; skipping this sample")
        elif math.isinf(expected):
            pass
        else:
            pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numba as nb
import numpy as np
import pytest
from unstructured_inference.math import safe_division

# Import the function to test
FLOAT_EPSILON = np.finfo(np.float64).eps

def test_basic_division_positive_numbers():
    """Test safe_division with positive integers."""
    codeflash_output = safe_division(10, 2); result = codeflash_output # 2.36μs -> 2.12μs (11.3% faster)

def test_basic_division_positive_floats():
    """Test safe_division with positive floating-point numbers."""
    codeflash_output = safe_division(10.5, 2.5); result = codeflash_output # 2.17μs -> 1.87μs (15.9% faster)

def test_basic_division_negative_numerator():
    """Test safe_division when numerator is negative."""
    codeflash_output = safe_division(-10, 2); result = codeflash_output # 2.34μs -> 2.06μs (13.6% faster)

def test_basic_division_negative_denominator():
    """Test safe_division when denominator is negative."""
    codeflash_output = safe_division(10, -2); result = codeflash_output # 3.51μs -> 3.20μs (9.81% faster)

def test_basic_division_both_negative():
    """Test safe_division when both numerator and denominator are negative."""
    codeflash_output = safe_division(-10, -2); result = codeflash_output # 3.28μs -> 2.88μs (14.0% faster)

def test_basic_division_zero_numerator():
    """Test safe_division when numerator is zero."""
    codeflash_output = safe_division(0, 5); result = codeflash_output # 2.31μs -> 2.06μs (12.7% faster)

def test_basic_division_one_as_denominator():
    """Test safe_division with 1 as denominator."""
    codeflash_output = safe_division(7, 1); result = codeflash_output # 2.32μs -> 2.05μs (13.4% faster)

def test_basic_division_fractional_result():
    """Test safe_division producing a fractional result."""
    codeflash_output = safe_division(1, 3); result = codeflash_output # 2.24μs -> 2.05μs (9.01% faster)

def test_division_by_zero():
    """Test safe_division with zero denominator - should use FLOAT_EPSILON."""
    codeflash_output = safe_division(10, 0); result = codeflash_output # 3.32μs -> 2.94μs (13.0% faster)
    expected = 10.0 / FLOAT_EPSILON

def test_division_by_zero_with_negative_numerator():
    """Test safe_division with zero denominator and negative numerator."""
    codeflash_output = safe_division(-10, 0); result = codeflash_output # 3.54μs -> 3.01μs (17.6% faster)
    expected = -10.0 / FLOAT_EPSILON

def test_division_by_zero_with_zero_numerator():
    """Test safe_division with both numerator and denominator as zero."""
    codeflash_output = safe_division(0, 0); result = codeflash_output # 3.28μs -> 2.82μs (16.1% faster)
    expected = 0.0 / FLOAT_EPSILON

def test_division_by_very_small_positive_number():
    """Test safe_division with very small positive denominator less than FLOAT_EPSILON."""
    small_num = FLOAT_EPSILON / 2
    codeflash_output = safe_division(10, small_num); result = codeflash_output # 2.03μs -> 1.60μs (27.1% faster)
    # Should be treated as if denominator is FLOAT_EPSILON
    expected = 10.0 / FLOAT_EPSILON

def test_division_by_negative_small_number():
    """Test safe_division with small negative denominator less than FLOAT_EPSILON in magnitude."""
    small_num = -FLOAT_EPSILON / 2
    codeflash_output = safe_division(10, small_num); result = codeflash_output # 1.96μs -> 1.42μs (37.8% faster)
    # Should use FLOAT_EPSILON as denominator (positive)
    expected = 10.0 / FLOAT_EPSILON

def test_division_by_exactly_float_epsilon():
    """Test safe_division with denominator exactly equal to FLOAT_EPSILON."""
    codeflash_output = safe_division(10, FLOAT_EPSILON); result = codeflash_output # 3.02μs -> 2.74μs (10.2% faster)
    expected = 10.0 / FLOAT_EPSILON

def test_division_by_slightly_greater_than_epsilon():
    """Test safe_division with denominator slightly greater than FLOAT_EPSILON."""
    denominator = FLOAT_EPSILON * 1.1
    codeflash_output = safe_division(10, denominator); result = codeflash_output # 2.27μs -> 1.82μs (24.9% faster)
    expected = 10.0 / denominator

def test_very_large_numerator():
    """Test safe_division with very large numerator."""
    large_num = 1e308
    codeflash_output = safe_division(large_num, 2); result = codeflash_output # 2.40μs -> 2.10μs (14.4% faster)
    expected = large_num / 2.0

def test_very_small_numerator():
    """Test safe_division with very small positive numerator."""
    small_num = 1e-308
    codeflash_output = safe_division(small_num, 2); result = codeflash_output # 2.43μs -> 2.21μs (10.1% faster)
    expected = small_num / 2.0

def test_integer_inputs():
    """Test safe_division with integer inputs."""
    codeflash_output = safe_division(7, 2); result = codeflash_output # 2.36μs -> 2.10μs (12.3% faster)
    expected = 7.0 / 2.0

def test_mixed_int_float_inputs():
    """Test safe_division with mixed integer and float inputs."""
    codeflash_output = safe_division(10, 2.5); result = codeflash_output # 2.38μs -> 2.16μs (10.1% faster)
    expected = 10.0 / 2.5

def test_float_int_inputs():
    """Test safe_division with float numerator and int denominator."""
    codeflash_output = safe_division(10.5, 2); result = codeflash_output # 2.36μs -> 2.07μs (14.3% faster)
    expected = 10.5 / 2.0

def test_numpy_float64_inputs():
    """Test safe_division with numpy float64 inputs."""
    codeflash_output = safe_division(np.float64(10.0), np.float64(2.0)); result = codeflash_output # 2.53μs -> 2.04μs (23.8% faster)
    expected = 5.0

def test_numpy_int_inputs():
    """Test safe_division with numpy int inputs."""
    codeflash_output = safe_division(np.int64(10), np.int64(2)); result = codeflash_output # 3.53μs -> 3.83μs (7.63% slower)
    expected = 5.0

def test_negative_zero_denominator():
    """Test safe_division with negative zero as denominator."""
    codeflash_output = safe_division(10, -0.0); result = codeflash_output # 3.43μs -> 3.10μs (10.5% faster)
    expected = 10.0 / FLOAT_EPSILON

def test_large_batch_divisions():
    """Test safe_division with many different values in a batch."""
    numerators = np.array([10, 20, 30, 40, 50, -10, -20, 0, 100, 1000], dtype=np.float64)
    denominators = np.array([2, 4, 5, 8, 10, 2, 4, 5, 10, 100], dtype=np.float64)
    
    # Verify results for a representative sample
    for i in range(len(numerators)):
        codeflash_output = safe_division(numerators[i], denominators[i]); result = codeflash_output # 6.21μs -> 5.07μs (22.6% faster)
        expected = numerators[i] / denominators[i]

def test_large_batch_with_small_denominators():
    """Test safe_division with many small denominators near FLOAT_EPSILON."""
    numerators = np.linspace(1, 100, 50, dtype=np.float64)
    small_denom = FLOAT_EPSILON / 2
    
    # All results should use FLOAT_EPSILON as the denominator
    for num in numerators:
        codeflash_output = safe_division(num, small_denom); result = codeflash_output # 21.1μs -> 17.6μs (19.8% faster)
        expected = num / FLOAT_EPSILON

def test_large_batch_with_zeros():
    """Test safe_division with batch containing zeros in both positions."""
    test_cases = [
        (0, 0),
        (0, 1),
        (0, FLOAT_EPSILON / 2),
        (10, 0),
        (-10, 0),
    ]
    
    for num, denom in test_cases:
        codeflash_output = safe_division(num, denom); result = codeflash_output # 4.50μs -> 3.94μs (14.2% faster)

def test_large_range_values():
    """Test safe_division with a wide range of values."""
    test_values = [
        (1e-100, 2),
        (1e100, 2),
        (1e50, 1e-50),
        (1e-50, 1e50),
        (1e-200, 1e-200),
        (-1e100, -1e100),
    ]
    
    for num, denom in test_values:
        codeflash_output = safe_division(num, denom); result = codeflash_output # 5.86μs -> 5.26μs (11.4% faster)

def test_consistency_across_repeated_calls():
    """Test that safe_division produces consistent results across multiple calls."""
    num, denom = 42.0, 7.0
    results = [safe_division(num, denom) for _ in range(100)]
    
    # All results should be identical
    first_result = results[0]
    for result in results[1:]:
        pass

def test_batch_with_alternating_signs():
    """Test safe_division with alternating positive and negative values."""
    for i in range(100):
        num = 10.0 if i % 2 == 0 else -10.0
        denom = 2.0 if i % 3 == 0 else 3.0
        codeflash_output = safe_division(num, denom); result = codeflash_output # 39.2μs -> 30.0μs (30.7% faster)
        expected = num / denom

def test_precision_with_large_scale_inputs():
    """Test precision of safe_division with large numbers."""
    # Test cases where precision might be lost with naive implementation
    large_base = 1e15
    test_cases = [
        (large_base, large_base),  # Should be 1.0
        (large_base + 1, large_base),  # Should be slightly > 1.0
        (large_base, large_base + 1),  # Should be slightly < 1.0
    ]
    
    for num, denom in test_cases:
        codeflash_output = safe_division(num, denom); result = codeflash_output # 3.01μs -> 2.56μs (17.7% faster)
        expected = num / denom

def test_many_operations_with_zero_denominator():
    """Test many operations with zero denominator to ensure stability."""
    for i in range(100):
        num = float(i) - 50.0
        codeflash_output = safe_division(num, 0); result = codeflash_output # 46.9μs -> 41.9μs (12.0% faster)
        expected = num / FLOAT_EPSILON

def test_batch_operations_mixed_edge_cases():
    """Test batch operations with mix of normal and edge cases."""
    test_cases = [
        (10, 2),           # Normal case
        (10, 0),           # Division by zero
        (0, 5),            # Zero numerator
        (10, FLOAT_EPSILON / 2),  # Very small denominator
        (-5, -5),          # Both negative
        (1e-100, 1e-100),  # Very small numbers
    ]
    
    for num, denom in test_cases:
        codeflash_output = safe_division(num, denom); result = codeflash_output # 4.97μs -> 4.48μs (10.8% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-safe_division-mkouw6ho and push.

Codeflash Static Badge

The optimization replaces `max(b, FLOAT_EPSILON)` with a conditional expression `a / FLOAT_EPSILON if b <= FLOAT_EPSILON else a / b`. This provides a **17% speedup** by eliminating the overhead of Python's `max()` built-in function call.

**Key Performance Improvements:**

1. **Eliminates Function Call Overhead**: The `max()` function involves Python's function call machinery (argument unpacking, dispatch), which is costly for such a simple operation. The conditional expression evaluates directly without this overhead.

2. **Branch Prediction Benefits**: The `if-else` construct allows the CPU's branch predictor to optimize the common case. Looking at the test results, when `b` is a normal value (>> FLOAT_EPSILON), the else branch is taken and executes efficiently. Test cases with normal denominators show 10-15% speedups, while edge cases with tiny denominators show even better improvements (25-38%).

3. **Micro-optimization Impact**: Per-hit time improved from 919.2ns to 811.1ns (~12% per call), which compounds significantly when called repeatedly.

**Why This Matters Based on Function References:**

The function is used in hot paths for geometric computations in `unstructured_inference/inference/elements.py`:
- `intersection_over_union()` - Called for comparing rectangle similarity
- `intersection_over_minimum()` - Used for subset detection
- `is_almost_subregion_of()` - Performs subregion checks

These operations are likely executed in tight loops during document layout analysis, where comparing many bounding boxes is common. The 17% speedup means faster document processing pipelines.

**Test Case Performance Patterns:**

- **Normal cases** (denominator >> FLOAT_EPSILON): 10-16% faster - the common path benefits from avoiding `max()`
- **Edge cases** (denominator ≤ FLOAT_EPSILON): 20-38% faster - these benefit from short-circuit evaluation taking the first branch immediately
- **Batch operations**: 20-31% faster when processing multiple rectangles, compounding the per-call savings

The optimization maintains identical behavior while delivering consistent performance gains across all workload types.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 January 22, 2026 02:52
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants