Skip to content

⚡️ Speed up method InferenceConfig._get_int by 10%#52

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-InferenceConfig._get_int-mkowkf1p
Open

⚡️ Speed up method InferenceConfig._get_int by 10%#52
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-InferenceConfig._get_int-mkowkf1p

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Jan 22, 2026

📄 10% (0.10x) speedup for InferenceConfig._get_int in unstructured_inference/config.py

⏱️ Runtime : 1.08 milliseconds 976 microseconds (best of 72 runs)

📝 Explanation and details

The optimization eliminates an unnecessary intermediate function call in _get_int().

What changed:
The original code calls self._get_string(var) which internally calls os.environ.get(var). The optimized version directly calls os.environ.get(var) instead, bypassing the _get_string() wrapper.

Why it's faster:

  1. Removes function call overhead: Each function call in Python has overhead (frame creation, argument passing, return value handling). By eliminating the intermediate _get_string() call, we save approximately 4,000-5,000 nanoseconds per invocation based on the line profiler results.

  2. Single dictionary lookup: The original code effectively performs the same os.environ.get() operation but wrapped in an extra function layer. The optimization removes this indirection, resulting in a more direct path to the environment variable lookup.

Performance impact:

  • Line profiler shows the walrus assignment in _get_int() dropped from ~6.6ms to ~3.4ms total time (48% faster on that line)
  • Overall function execution improved by ~10% (1.08ms → 976μs)
  • All test cases show consistent speedups ranging from 2-13%, with the largest gains in cases that successfully retrieve environment variables

Test case patterns:
The optimization benefits all test scenarios uniformly:

  • Normal integer retrieval: 5-13% faster
  • Missing variables (returns default): 6-10% faster
  • Error cases (invalid values): 2-4% faster
  • Large-scale test (500 iterations): 11.6% faster, demonstrating the cumulative benefit in hot paths

Behavioral preservation:
The optimization maintains identical functionality - both versions handle empty strings, whitespace, missing variables, and invalid inputs the same way. The _get_string() method is still available for other use cases that may need string-specific handling.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 764 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 4 Passed
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import os
from dataclasses import dataclass

import pytest  # used for our unit tests
from unstructured_inference.config import InferenceConfig

def test_basic_integer_value(monkeypatch):
    # Basic: env var set to a normal integer string should parse correctly.
    var = "TEST_BASIC_INT"
    monkeypatch.setenv(var, "42")  # set env to "42"
    cfg = InferenceConfig()
    codeflash_output = cfg._get_int(var, 10) # 2.82μs -> 2.51μs (12.2% faster)

def test_zero_value_and_type(monkeypatch):
    # Basic: "0" should be accepted and returned as integer 0 (and type int).
    var = "TEST_ZERO_INT"
    monkeypatch.setenv(var, "0")
    cfg = InferenceConfig()
    codeflash_output = cfg._get_int(var, 5); result = codeflash_output # 2.84μs -> 2.54μs (11.8% faster)

def test_negative_integer(monkeypatch):
    # Basic: negative integer strings should be parsed to negative ints.
    var = "TEST_NEG_INT"
    monkeypatch.setenv(var, "-7")
    cfg = InferenceConfig()
    codeflash_output = cfg._get_int(var, 0) # 2.74μs -> 2.60μs (5.30% faster)

def test_with_surrounding_spaces(monkeypatch):
    # Edge: integer string with leading/trailing whitespace should parse fine.
    var = "TEST_SPACED_INT"
    monkeypatch.setenv(var, "   123  \n")  # whitespace around the number
    cfg = InferenceConfig()
    codeflash_output = cfg._get_int(var, 0) # 2.97μs -> 2.72μs (9.38% faster)

def test_leading_plus_sign(monkeypatch):
    # Edge: strings like "+5" are valid for int() and should return 5.
    var = "TEST_PLUS_INT"
    monkeypatch.setenv(var, "+5")
    cfg = InferenceConfig()
    codeflash_output = cfg._get_int(var, -1) # 2.79μs -> 2.60μs (7.03% faster)

def test_double_zero_string(monkeypatch):
    # Edge: "00" should be parsed as integer 0 (no octal behavior in int()).
    var = "TEST_DOUBLE_ZERO"
    monkeypatch.setenv(var, "00")
    cfg = InferenceConfig()
    codeflash_output = cfg._get_int(var, 1) # 2.88μs -> 2.54μs (13.3% faster)

def test_missing_env_returns_default(monkeypatch):
    # Edge: when env var is not present, the function should return the provided default.
    var = "TEST_MISSING_INT"
    monkeypatch.delenv(var, raising=False)  # ensure not set
    cfg = InferenceConfig()
    codeflash_output = cfg._get_int(var, 77) # 2.85μs -> 2.60μs (9.42% faster)

def test_empty_string_env_returns_default(monkeypatch):
    # Edge: when env var is set to empty string, _get_string returns "" which is falsey,
    # so _get_int should return the default value.
    var = "TEST_EMPTY_INT"
    monkeypatch.setenv(var, "")  # explicitly empty string
    cfg = InferenceConfig()
    codeflash_output = cfg._get_int(var, -999) # 2.33μs -> 2.19μs (6.68% faster)

def test_space_only_string_raises_value_error(monkeypatch):
    # Edge: if env var contains only whitespace (truthy), walrus condition causes int() to be called,
    # but int("   ") is invalid and should raise ValueError.
    var = "TEST_SPACE_ONLY"
    monkeypatch.setenv(var, "   ")  # only spaces -> truthy, but invalid int
    cfg = InferenceConfig()
    with pytest.raises(ValueError):
        cfg._get_int(var, 0) # 7.29μs -> 7.01μs (3.95% faster)

def test_non_integer_string_raises_value_error(monkeypatch):
    # Edge: clearly non-numeric strings should raise ValueError when parsed with int().
    var = "TEST_NON_INT"
    monkeypatch.setenv(var, "not_a_number")
    cfg = InferenceConfig()
    with pytest.raises(ValueError):
        cfg._get_int(var, 123) # 7.09μs -> 6.94μs (2.19% faster)

def test_float_string_raises_value_error(monkeypatch):
    # Edge: float-like strings (e.g., "3.14") are invalid for int() and should raise ValueError.
    var = "TEST_FLOAT_STRING"
    monkeypatch.setenv(var, "3.14")
    cfg = InferenceConfig()
    with pytest.raises(ValueError):
        cfg._get_int(var, 0) # 6.95μs -> 6.72μs (3.42% faster)

def test_large_integer_value(monkeypatch):
    # Edge / Large: very large integers should be handled by Python's arbitrary-precision ints.
    var = "TEST_LARGE_INT"
    large = 10 ** 18  # large but within Python's capabilities
    monkeypatch.setenv(var, str(large))
    cfg = InferenceConfig()
    codeflash_output = cfg._get_int(var, 0) # 3.03μs -> 2.84μs (6.62% faster)

def test_default_is_used_when_env_value_is_empty_string_vs_zero(monkeypatch):
    # This test highlights behavior difference between "" (empty string) and "0":
    # empty string -> default returned; "0" -> int(0) returned.
    var_empty = "TEST_EMPTY_VS_ZERO_EMPTY"
    var_zero = "TEST_EMPTY_VS_ZERO_ZERO"
    monkeypatch.setenv(var_empty, "")
    monkeypatch.setenv(var_zero, "0")
    cfg = InferenceConfig()
    codeflash_output = cfg._get_int(var_empty, 999) # 2.39μs -> 2.17μs (9.94% faster)
    codeflash_output = cfg._get_int(var_zero, 999) # 1.85μs -> 1.69μs (9.79% faster)

def test_large_scale_many_vars(monkeypatch):
    # Large Scale: ensure function is stable and correct when called many times with many env vars.
    # We keep count <= 500 to respect the guideline to avoid overly large loops/data structures.
    count = 500
    cfg = InferenceConfig()
    # Set up many environment variables with predictable integer values.
    for i in range(count):
        var = f"TEST_SCALE_INT_{i}"
        # Use a simple multiple so we can verify exact values.
        monkeypatch.setenv(var, str(i * 3))
    # Verify retrieving each value yields the expected integer.
    for i in range(count):
        var = f"TEST_SCALE_INT_{i}"
        expected = i * 3
        codeflash_output = cfg._get_int(var, -1) # 654μs -> 586μs (11.6% faster)

def test_using_negative_default_when_missing(monkeypatch):
    # Edge: ensure negative defaults are preserved when an env var is missing.
    var = "TEST_NEG_DEFAULT"
    monkeypatch.delenv(var, raising=False)
    cfg = InferenceConfig()
    codeflash_output = cfg._get_int(var, -314159) # 2.75μs -> 2.59μs (5.94% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import os

import pytest
from unstructured_inference.config import InferenceConfig

class TestInferenceConfigGetInt:
    """Test suite for InferenceConfig._get_int method"""

    # ============================================================================
    # BASIC TEST CASES - Verify fundamental functionality under normal conditions
    # ============================================================================

    def test_get_int_returns_default_when_env_var_not_set(self):
        """Test that _get_int returns default_value when environment variable is not set"""
        config = InferenceConfig()
        # Ensure the variable is not in the environment
        env_var = "NONEXISTENT_VAR_12345"
        if env_var in os.environ:
            del os.environ[env_var]
        
        # Should return the default value when env var doesn't exist
        codeflash_output = config._get_int(env_var, default_value=42); result = codeflash_output # 3.29μs -> 3.07μs (7.19% faster)

    def test_get_int_returns_int_from_env_var(self):
        """Test that _get_int returns the integer value from environment variable"""
        config = InferenceConfig()
        env_var = "TEST_INT_VAR_BASIC"
        
        # Set environment variable to a valid integer string
        os.environ[env_var] = "123"
        try:
            codeflash_output = config._get_int(env_var, default_value=999); result = codeflash_output
        finally:
            # Clean up environment
            if env_var in os.environ:
                del os.environ[env_var]

    def test_get_int_with_positive_integer(self):
        """Test _get_int with positive integer values"""
        config = InferenceConfig()
        env_var = "TEST_POSITIVE_INT"
        
        os.environ[env_var] = "100"
        try:
            codeflash_output = config._get_int(env_var, default_value=0); result = codeflash_output
        finally:
            if env_var in os.environ:
                del os.environ[env_var]

    def test_get_int_with_negative_integer(self):
        """Test _get_int with negative integer values"""
        config = InferenceConfig()
        env_var = "TEST_NEGATIVE_INT"
        
        os.environ[env_var] = "-50"
        try:
            codeflash_output = config._get_int(env_var, default_value=0); result = codeflash_output
        finally:
            if env_var in os.environ:
                del os.environ[env_var]

    def test_get_int_with_zero(self):
        """Test _get_int with zero value"""
        config = InferenceConfig()
        env_var = "TEST_ZERO_INT"
        
        os.environ[env_var] = "0"
        try:
            codeflash_output = config._get_int(env_var, default_value=999); result = codeflash_output
        finally:
            if env_var in os.environ:
                del os.environ[env_var]

    def test_get_int_with_default_zero(self):
        """Test that default_value can be zero"""
        config = InferenceConfig()
        env_var = "NONEXISTENT_VAR_ZERO_DEFAULT"
        if env_var in os.environ:
            del os.environ[env_var]
        
        codeflash_output = config._get_int(env_var, default_value=0); result = codeflash_output # 3.34μs -> 3.10μs (7.78% faster)

    def test_get_int_with_large_positive_default(self):
        """Test _get_int with a large positive default value"""
        config = InferenceConfig()
        env_var = "NONEXISTENT_VAR_LARGE_DEFAULT"
        if env_var in os.environ:
            del os.environ[env_var]
        
        codeflash_output = config._get_int(env_var, default_value=1000000); result = codeflash_output # 3.23μs -> 3.00μs (7.64% faster)

    # ============================================================================
    # EDGE CASE TEST CASES - Evaluate behavior under extreme or unusual conditions
    # ============================================================================

    def test_get_int_with_empty_string_env_var(self):
        """Test _get_int when environment variable is set to empty string"""
        config = InferenceConfig()
        env_var = "TEST_EMPTY_STRING"
        
        os.environ[env_var] = ""
        try:
            # Empty string should be falsy, so default should be returned
            codeflash_output = config._get_int(env_var, default_value=77); result = codeflash_output
        finally:
            if env_var in os.environ:
                del os.environ[env_var]

    def test_get_int_with_whitespace_only_env_var(self):
        """Test _get_int when environment variable contains only whitespace"""
        config = InferenceConfig()
        env_var = "TEST_WHITESPACE"
        
        os.environ[env_var] = "   "
        try:
            # Whitespace string will be converted to int, which should raise ValueError
            with pytest.raises(ValueError):
                config._get_int(env_var, default_value=0)
        finally:
            if env_var in os.environ:
                del os.environ[env_var]

    def test_get_int_with_leading_whitespace(self):
        """Test _get_int when environment variable has leading whitespace"""
        config = InferenceConfig()
        env_var = "TEST_LEADING_WHITESPACE"
        
        os.environ[env_var] = "  42"
        try:
            # Python's int() function handles leading whitespace
            codeflash_output = config._get_int(env_var, default_value=0); result = codeflash_output
        finally:
            if env_var in os.environ:
                del os.environ[env_var]

    def test_get_int_with_trailing_whitespace(self):
        """Test _get_int when environment variable has trailing whitespace"""
        config = InferenceConfig()
        env_var = "TEST_TRAILING_WHITESPACE"
        
        os.environ[env_var] = "42  "
        try:
            # Python's int() function handles trailing whitespace
            codeflash_output = config._get_int(env_var, default_value=0); result = codeflash_output
        finally:
            if env_var in os.environ:
                del os.environ[env_var]

    def test_get_int_with_invalid_string_raises_error(self):
        """Test that _get_int raises ValueError for non-integer string"""
        config = InferenceConfig()
        env_var = "TEST_INVALID_STRING"
        
        os.environ[env_var] = "not_an_integer"
        try:
            with pytest.raises(ValueError):
                config._get_int(env_var, default_value=0)
        finally:
            if env_var in os.environ:
                del os.environ[env_var]

    def test_get_int_with_float_string_raises_error(self):
        """Test that _get_int raises ValueError for float string"""
        config = InferenceConfig()
        env_var = "TEST_FLOAT_STRING"
        
        os.environ[env_var] = "3.14"
        try:
            with pytest.raises(ValueError):
                config._get_int(env_var, default_value=0)
        finally:
            if env_var in os.environ:
                del os.environ[env_var]

    def test_get_int_with_negative_default(self):
        """Test _get_int with negative default value"""
        config = InferenceConfig()
        env_var = "NONEXISTENT_VAR_NEGATIVE_DEFAULT"
        if env_var in os.environ:
            del os.environ[env_var]
        
        codeflash_output = config._get_int(env_var, default_value=-100); result = codeflash_output # 3.23μs -> 3.13μs (3.39% faster)

    def test_get_int_with_very_large_integer(self):
        """Test _get_int with very large integer value"""
        config = InferenceConfig()
        env_var = "TEST_VERY_LARGE_INT"
        large_int = 999999999999999999
        
        os.environ[env_var] = str(large_int)
        try:
            codeflash_output = config._get_int(env_var, default_value=0); result = codeflash_output
        finally:
            if env_var in os.environ:
                del os.environ[env_var]

    def test_get_int_with_very_large_negative_integer(self):
        """Test _get_int with very large negative integer value"""
        config = InferenceConfig()
        env_var = "TEST_VERY_LARGE_NEG_INT"
        large_neg_int = -999999999999999999
        
        os.environ[env_var] = str(large_neg_int)
        try:
            codeflash_output = config._get_int(env_var, default_value=0); result = codeflash_output
        finally:
            if env_var in os.environ:
                del os.environ[env_var]

    
from unstructured_inference.config import InferenceConfig
import pytest

def test_InferenceConfig__get_int():
    with pytest.raises(ValueError, match="invalid\\ literal\\ for\\ int\\(\\)\\ with\\ base\\ 10:\\ '/home/aseem/cf\\-unstr/unstructured\\-inference/\\.venv/bin/codeflash'"):
        InferenceConfig._get_int(InferenceConfig(), '_', 0)

def test_InferenceConfig__get_int_2():
    InferenceConfig._get_int(InferenceConfig(), '', 0)
🔎 Click to see Concolic Coverage Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_toh405kj/tmps62ev7c8/test_concolic_coverage.py::test_InferenceConfig__get_int 7.44μs 7.47μs -0.402%⚠️
codeflash_concolic_toh405kj/tmps62ev7c8/test_concolic_coverage.py::test_InferenceConfig__get_int_2 3.72μs 3.59μs 3.71%✅

To edit these changes git checkout codeflash/optimize-InferenceConfig._get_int-mkowkf1p and push.

Codeflash Static Badge

The optimization eliminates an unnecessary intermediate function call in `_get_int()`. 

**What changed:**
The original code calls `self._get_string(var)` which internally calls `os.environ.get(var)`. The optimized version directly calls `os.environ.get(var)` instead, bypassing the `_get_string()` wrapper.

**Why it's faster:**
1. **Removes function call overhead**: Each function call in Python has overhead (frame creation, argument passing, return value handling). By eliminating the intermediate `_get_string()` call, we save approximately 4,000-5,000 nanoseconds per invocation based on the line profiler results.

2. **Single dictionary lookup**: The original code effectively performs the same `os.environ.get()` operation but wrapped in an extra function layer. The optimization removes this indirection, resulting in a more direct path to the environment variable lookup.

**Performance impact:**
- Line profiler shows the walrus assignment in `_get_int()` dropped from ~6.6ms to ~3.4ms total time (48% faster on that line)
- Overall function execution improved by ~10% (1.08ms → 976μs)
- All test cases show consistent speedups ranging from 2-13%, with the largest gains in cases that successfully retrieve environment variables

**Test case patterns:**
The optimization benefits all test scenarios uniformly:
- Normal integer retrieval: 5-13% faster
- Missing variables (returns default): 6-10% faster  
- Error cases (invalid values): 2-4% faster
- Large-scale test (500 iterations): 11.6% faster, demonstrating the cumulative benefit in hot paths

**Behavioral preservation:**
The optimization maintains identical functionality - both versions handle empty strings, whitespace, missing variables, and invalid inputs the same way. The `_get_string()` method is still available for other use cases that may need string-specific handling.
@codeflash-ai codeflash-ai bot requested a review from aseembits93 January 22, 2026 03:39
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants