Skip to content

⚡️ Speed up function translate_log_level by 121%#46

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-translate_log_level-mkoup6xq
Open

⚡️ Speed up function translate_log_level by 121%#46
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-translate_log_level-mkoup6xq

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Jan 22, 2026

📄 121% (1.21x) speedup for translate_log_level in unstructured_inference/logger.py

⏱️ Runtime : 690 microseconds 313 microseconds (best of 115 runs)

📝 Explanation and details

The optimization achieves a 120% speedup by replacing expensive string-based lookups with a direct integer-to-integer dictionary mapping.

Key Changes

Original approach:

  1. Calls logging.getLevelName(level) - a function that converts numeric levels to strings (56.2% of runtime)
  2. Performs two list membership checks with string comparisons (in ["NOTSET", "DEBUG", ...])
  3. Initializes a variable and conditionally assigns values

Optimized approach:

  1. Pre-computes a static dictionary _LOG_LEVEL_TO_ONNX mapping integer log levels directly to ONNX levels
  2. Performs a single dict.get() operation with default fallback

Why This Is Faster

  1. Eliminates expensive getLevelName() call: The original spent >56% of its time converting integers to strings. The optimization removes this entirely by working directly with integer keys.

  2. O(1) dictionary lookup vs O(n) list scanning: Dictionary lookups are constant-time hash operations, while list membership checks require linear scanning and string comparisons. The line profiler shows 12.2% + 6.8% of time spent on these checks.

  3. Reduced operations: The optimized version executes one dictionary lookup vs 4-5 operations (function call, variable initialization, two conditional checks with list scans).

Test Results Analysis

  • Standard levels (NOTSET, DEBUG, INFO, etc.): 67-150% faster, as these are the primary use case
  • Unknown/custom levels: 200-400% faster due to avoiding the getLevelName() fallback behavior for non-standard integers
  • Bulk operations: The 500-call performance test shows 97% improvement, demonstrating the optimization scales well with high call volumes

Impact Considerations

Since translate_log_level is a logging utility function, it's likely called frequently during application runtime, especially in hot paths where logging decisions are made. The optimization is particularly beneficial for:

  • High-throughput applications with frequent log level translations
  • Scenarios with many custom/unknown log levels (where the original code would construct "Level N" strings)
  • Initialization and configuration code that processes multiple log levels

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 50 Passed
🌀 Generated Regression Tests 964 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_logger.py::test_translate_log_level 79.7μs 37.2μs 114%✅
🌀 Click to see Generated Regression Tests
import logging
import random
from typing import List

# imports
import pytest  # used for our unit tests
from unstructured_inference.logger import translate_log_level

# unit tests

# Helper used in multiple tests to compute expected result using same decision logic.
# Using this helper in tests is acceptable because it mirrors the documented mapping rules
# rather than duplicating internal implementation details.
def _expected_onnx_level_from_logging_name(level) -> int:
    # Determine level name via the public logging API (same call the function under test makes).
    level_name = logging.getLevelName(level)
    if level_name in ["NOTSET", "DEBUG", "INFO", "WARNING"]:
        return 4
    if level_name in ["ERROR", "CRITICAL"]:
        return 3
    return 0

def test_basic_standard_levels_map_to_expected_values():
    # Basic functionality checks for the canonical logging level constants.
    # These are the most common/expected inputs and should map deterministically.
    codeflash_output = translate_log_level(logging.NOTSET) # 1.39μs -> 778ns (78.4% faster)
    codeflash_output = translate_log_level(logging.DEBUG) # 713ns -> 425ns (67.8% faster)
    codeflash_output = translate_log_level(logging.INFO) # 592ns -> 288ns (106% faster)
    codeflash_output = translate_log_level(logging.WARNING) # 566ns -> 273ns (107% faster)
    codeflash_output = translate_log_level(logging.ERROR) # 687ns -> 275ns (150% faster)
    codeflash_output = translate_log_level(logging.CRITICAL) # 629ns -> 293ns (115% faster)

def test_explicit_values_for_numeric_level_names_and_unknown_integers():
    # Confirm behavior for integers that are not one of the standard constants.
    # 15 is a common example of a custom level (between DEBUG(10) and INFO(20))
    # It should produce 0 because its derived name is not in the mapped lists.
    custom_level = 15
    codeflash_output = translate_log_level(custom_level) # 2.78μs -> 732ns (279% faster)

    # Large integer that is not a known level should also fall back to 0.
    huge_level = 999999
    codeflash_output = translate_log_level(huge_level) # 1.24μs -> 386ns (220% faster)

    # Negative integers are allowed by Python call signature; treat them as unknown.
    negative_level = -1
    codeflash_output = translate_log_level(negative_level) # 1.13μs -> 372ns (204% faster)

def test_type_handling_non_integer_inputs_do_not_raise_and_follow_getLevelName_behavior():
    codeflash_output = translate_log_level("DEBUG") # 1.89μs -> 684ns (176% faster)

    # Passing a float is also allowed at runtime. Confirm stable behavior (no exception).
    # Most likely logging.getLevelName will return the float itself or a string; ensure we don't error.
    # We accept the function's output as deterministic and verify it matches the helper expected.
    float_level = 20.0
    codeflash_output = translate_log_level(float_level) # 1.72μs -> 1.51μs (13.6% faster)

def test_round_trip_consistency_with_logging_api_for_various_known_names():
    # Validate that for each known standard level name, mapping the numeric value produces
    # the same result as computing expected value via the logging.getLevelName based helper.
    levels = [logging.NOTSET, logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL]
    for lvl in levels:
        expected = _expected_onnx_level_from_logging_name(lvl)
        codeflash_output = translate_log_level(lvl) # 4.01μs -> 2.26μs (76.9% faster)

def test_large_scale_mixed_levels_performance_and_correctness():
    # Large-scale test: generate up to 500 level values (within the allowed limit)
    # mixing canonical levels and many unknown/custom integers to simulate heavy usage.
    # We keep the sample deterministic by seeding random.
    rng = random.Random(0)  # deterministic sequence
    sample_size = 500  # under the 1000 step limit
    # Compose a list that includes many standard levels plus random other integers.
    base_levels = [logging.NOTSET, logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL]
    mixed_levels: List[int] = []
    # Interleave a standard level every few entries to ensure presence in the sample.
    for i in range(sample_size):
        if i % 10 == 0:
            mixed_levels.append(base_levels[i % len(base_levels)])
        else:
            # random integer in a range that includes negative, small custom, and large values
            mixed_levels.append(rng.randint(-50, 500))

    # Compute expected results using the public logging API + documented mapping logic,
    # then compare elementwise to the function output.
    expected_results = [_expected_onnx_level_from_logging_name(lvl) for lvl in mixed_levels]
    actual_results = [translate_log_level(lvl) for lvl in mixed_levels]

def test_boundary_behavior_for_notset_and_similar_named_values():
    codeflash_output = translate_log_level(0) # 1.36μs -> 736ns (84.5% faster)

    # If a custom level name were to be registered in logging (not done in this test),
    # the function behavior depends solely on the textual name returned by logging.getLevelName.
    # We verify that the function only checks textual membership; for a fabricated name it returns 0.
    # Use a value whose name is likely to be 'Level X' (unknown).
    unknown = 12345
    codeflash_output = translate_log_level(unknown) # 2.21μs -> 436ns (408% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import logging

import pytest
from unstructured_inference.logger import translate_log_level

def test_notset_level():
    """Test that NOTSET level (0) returns ONNX level 4"""
    codeflash_output = translate_log_level(logging.NOTSET); result = codeflash_output # 1.36μs -> 751ns (81.2% faster)

def test_debug_level():
    """Test that DEBUG level (10) returns ONNX level 4"""
    codeflash_output = translate_log_level(logging.DEBUG); result = codeflash_output # 1.37μs -> 732ns (86.7% faster)

def test_info_level():
    """Test that INFO level (20) returns ONNX level 4"""
    codeflash_output = translate_log_level(logging.INFO); result = codeflash_output # 1.41μs -> 747ns (88.5% faster)

def test_warning_level():
    """Test that WARNING level (30) returns ONNX level 4"""
    codeflash_output = translate_log_level(logging.WARNING); result = codeflash_output # 1.41μs -> 754ns (86.9% faster)

def test_error_level():
    """Test that ERROR level (40) returns ONNX level 3"""
    codeflash_output = translate_log_level(logging.ERROR); result = codeflash_output # 1.60μs -> 750ns (113% faster)

def test_critical_level():
    """Test that CRITICAL level (50) returns ONNX level 3"""
    codeflash_output = translate_log_level(logging.CRITICAL); result = codeflash_output # 1.50μs -> 729ns (106% faster)

def test_unknown_level_returns_default():
    """Test that an unknown/undefined level returns default ONNX level 0"""
    # Using a level that doesn't correspond to a named level (e.g., 25)
    codeflash_output = translate_log_level(25); result = codeflash_output # 2.86μs -> 764ns (274% faster)

def test_negative_level():
    """Test that a negative level value returns default ONNX level 0"""
    codeflash_output = translate_log_level(-1); result = codeflash_output # 2.96μs -> 765ns (287% faster)

def test_large_undefined_level():
    """Test that a very large undefined level returns default ONNX level 0"""
    codeflash_output = translate_log_level(999); result = codeflash_output # 2.84μs -> 713ns (298% faster)

def test_level_between_standard_levels():
    """Test that a level between standard logging levels returns default ONNX level 0"""
    # Level 35 is between WARNING (30) and ERROR (40)
    codeflash_output = translate_log_level(35); result = codeflash_output # 2.87μs -> 691ns (315% faster)

def test_level_zero():
    """Test that level 0 (NOTSET) returns ONNX level 4"""
    codeflash_output = translate_log_level(0); result = codeflash_output # 1.31μs -> 738ns (77.2% faster)

def test_level_boundary_warning_error():
    """Test boundary between WARNING and ERROR levels"""
    # WARNING = 30, ERROR = 40, test 31-39 range
    codeflash_output = translate_log_level(31); result_31 = codeflash_output # 2.88μs -> 731ns (294% faster)
    codeflash_output = translate_log_level(39); result_39 = codeflash_output # 1.26μs -> 363ns (247% faster)

def test_level_boundary_error_critical():
    """Test boundary between ERROR and CRITICAL levels"""
    # ERROR = 40, CRITICAL = 50, test 41-49 range
    codeflash_output = translate_log_level(41); result_41 = codeflash_output # 2.88μs -> 728ns (296% faster)
    codeflash_output = translate_log_level(49); result_49 = codeflash_output # 1.18μs -> 372ns (217% faster)

def test_level_just_above_critical():
    """Test level just above CRITICAL (50)"""
    codeflash_output = translate_log_level(51); result = codeflash_output # 2.82μs -> 740ns (281% faster)

def test_return_type_is_integer():
    """Test that the return value is always an integer"""
    codeflash_output = translate_log_level(logging.INFO); result = codeflash_output # 1.35μs -> 709ns (90.0% faster)

def test_return_type_for_error():
    """Test that the return value for error is an integer"""
    codeflash_output = translate_log_level(logging.ERROR); result = codeflash_output # 1.55μs -> 738ns (109% faster)

def test_all_standard_logging_levels():
    """Test all standard Python logging levels to ensure correct mapping"""
    standard_levels = {
        logging.NOTSET: 4,
        logging.DEBUG: 4,
        logging.INFO: 4,
        logging.WARNING: 4,
        logging.ERROR: 3,
        logging.CRITICAL: 3,
    }
    
    for level, expected_onnx_level in standard_levels.items():
        codeflash_output = translate_log_level(level); result = codeflash_output # 4.56μs -> 2.27μs (101% faster)

def test_range_of_invalid_levels():
    """Test a range of invalid levels to ensure they all return 0"""
    # Test 100 different invalid levels scattered across range
    invalid_levels = [
        5, 8, 11, 15, 25, 35, 45, 55, 100, 150, 200, 500, 1000,
        -5, -10, -50, -100,
        2, 3, 7, 12, 18, 22, 27, 32, 37, 42, 47, 60, 75, 90
    ]
    
    for level in invalid_levels:
        codeflash_output = translate_log_level(level); result = codeflash_output # 29.1μs -> 9.29μs (214% faster)

def test_consistency_across_repeated_calls():
    """Test that the function returns consistent results across multiple calls"""
    # Test each standard level 10 times to ensure consistency
    standard_levels = {
        logging.DEBUG: 4,
        logging.INFO: 4,
        logging.WARNING: 4,
        logging.ERROR: 3,
        logging.CRITICAL: 3,
    }
    
    for level, expected in standard_levels.items():
        for _ in range(10):
            codeflash_output = translate_log_level(level); result = codeflash_output

def test_invalid_levels_consistency():
    """Test that invalid levels consistently return 0"""
    invalid_levels = [15, 35, 45, 75, 101, 255, 500]
    
    # Call each invalid level multiple times
    for level in invalid_levels:
        for _ in range(5):
            codeflash_output = translate_log_level(level); result = codeflash_output

def test_performance_with_many_sequential_calls():
    """Test performance with many sequential calls to ensure efficiency"""
    # Make 500 calls with various levels
    levels_sequence = [
        logging.DEBUG, logging.INFO, logging.WARNING,
        logging.ERROR, logging.CRITICAL, logging.NOTSET
    ] * 84  # 504 total calls
    
    results = []
    for level in levels_sequence:
        codeflash_output = translate_log_level(level); result = codeflash_output # 267μs -> 135μs (97.3% faster)
        results.append(result)

def test_no_side_effects():
    """Test that calling the function doesn't have side effects"""
    import logging as logging_module

    # Get initial logging level names
    initial_notset = logging_module.getLevelName(logging_module.NOTSET)
    initial_debug = logging_module.getLevelName(logging_module.DEBUG)
    initial_error = logging_module.getLevelName(logging_module.ERROR)
    
    # Call the function multiple times
    for _ in range(100):
        translate_log_level(logging_module.DEBUG) # 51.4μs -> 27.0μs (90.5% faster)
        translate_log_level(logging_module.ERROR) # 56.2μs -> 26.4μs (113% faster)
        translate_log_level(25) # 83.5μs -> 27.3μs (206% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from unstructured_inference.logger import translate_log_level

def test_translate_log_level():
    translate_log_level(40)

def test_translate_log_level_2():
    translate_log_level(0)
🔎 Click to see Concolic Coverage Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_toh405kj/tmp7fg2j1bt/test_concolic_coverage.py::test_translate_log_level 1.63μs 739ns 121%✅
codeflash_concolic_toh405kj/tmp7fg2j1bt/test_concolic_coverage.py::test_translate_log_level_2 1.45μs 723ns 100%✅

To edit these changes git checkout codeflash/optimize-translate_log_level-mkoup6xq and push.

Codeflash Static Badge

The optimization achieves a **120% speedup** by replacing expensive string-based lookups with a direct integer-to-integer dictionary mapping.

## Key Changes

**Original approach:**
1. Calls `logging.getLevelName(level)` - a function that converts numeric levels to strings (56.2% of runtime)
2. Performs two list membership checks with string comparisons (`in ["NOTSET", "DEBUG", ...]`)
3. Initializes a variable and conditionally assigns values

**Optimized approach:**
1. Pre-computes a static dictionary `_LOG_LEVEL_TO_ONNX` mapping integer log levels directly to ONNX levels
2. Performs a single `dict.get()` operation with default fallback

## Why This Is Faster

1. **Eliminates expensive `getLevelName()` call**: The original spent >56% of its time converting integers to strings. The optimization removes this entirely by working directly with integer keys.

2. **O(1) dictionary lookup vs O(n) list scanning**: Dictionary lookups are constant-time hash operations, while list membership checks require linear scanning and string comparisons. The line profiler shows 12.2% + 6.8% of time spent on these checks.

3. **Reduced operations**: The optimized version executes one dictionary lookup vs 4-5 operations (function call, variable initialization, two conditional checks with list scans).

## Test Results Analysis

- **Standard levels** (NOTSET, DEBUG, INFO, etc.): 67-150% faster, as these are the primary use case
- **Unknown/custom levels**: 200-400% faster due to avoiding the `getLevelName()` fallback behavior for non-standard integers
- **Bulk operations**: The 500-call performance test shows 97% improvement, demonstrating the optimization scales well with high call volumes

## Impact Considerations

Since `translate_log_level` is a logging utility function, it's likely called frequently during application runtime, especially in hot paths where logging decisions are made. The optimization is particularly beneficial for:
- High-throughput applications with frequent log level translations
- Scenarios with many custom/unknown log levels (where the original code would construct "Level N" strings)
- Initialization and configuration code that processes multiple log levels
@codeflash-ai codeflash-ai bot requested a review from aseembits93 January 22, 2026 02:46
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants