Skip to content

⚡️ Speed up function filter_bad_extensions by 19%#45

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-filter_bad_extensions-mgzh7vjn
Open

⚡️ Speed up function filter_bad_extensions by 19%#45
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-filter_bad_extensions-mgzh7vjn

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai bot commented Oct 20, 2025

📄 19% (0.19x) speedup for filter_bad_extensions in pr_agent/algo/language_handler.py

⏱️ Runtime : 22.8 milliseconds 19.3 milliseconds (best of 234 runs)

📝 Explanation and details

Explanation of Optimizations:

  • Redundant get_settings() calls eliminated: Avoided multiple calls to get_settings() within functions. Instead, retrieved settings once per function and reused, reducing overhead.
  • Inefficient list concatenation replaced: Avoided repeated list mutation with += on .extra bad extensions. Instead, computed the complete bad extensions set once and reused; this avoids each call modifying the list, prevents accidental side effects, and speeds up membership checks.
  • Improved extension membership test: Converted bad extensions to a set for O(1) membership testing, reducing time complexity of filename.split('.')[-1] not in bad_extensions from O(n) to O(1).
  • Efficient auto-generated file check: Used a tuple for str.endswith() to check all forbidden filenames in one call, improving speed and code clarity.

These changes preserve behavioral correctness while reducing runtime for large file lists or frequent function calls.


Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 53 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 75.0%
🌀 Generated Regression Tests and Runtime
import pytest
from pr_agent.algo.language_handler import filter_bad_extensions


# Helper class for testing
class FileObj:
    def __init__(self, filename):
        self.filename = filename

# -------------------- UNIT TESTS --------------------

# BASIC TEST CASES

def test_basic_valid_files_pass():
    # Files with extensions not in bad list should be returned
    files = [FileObj('main.py'), FileObj('index.html'), FileObj('README.md')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 329μs -> 327μs (0.556% faster)

def test_basic_bad_extensions_filtered():
    # Files with extensions in bad list should be excluded
    files = [FileObj('evil.exe'), FileObj('main.py'), FileObj('archive.zip')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 328μs -> 329μs (0.204% slower)

def test_basic_auto_generated_files_filtered():
    # Auto-generated files should be excluded regardless of extension
    files = [FileObj('package-lock.json'), FileObj('yarn.lock'), FileObj('composer.lock'), FileObj('Gemfile.lock'), FileObj('poetry.lock'), FileObj('main.py')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 327μs -> 327μs (0.228% faster)

def test_basic_mixed_files():
    # Mixture of valid, bad, auto-generated, and no extension
    files = [
        FileObj('good.py'), FileObj('bad.exe'), FileObj('Gemfile.lock'),
        FileObj('noext'), FileObj('archive.tar'), FileObj('README.md')
    ]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 328μs -> 328μs (0.192% faster)

# EDGE TEST CASES

def test_empty_file_list():
    # Empty input should return empty output
    files = []
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 321μs -> 322μs (0.521% slower)

def test_none_filename():
    # File object with filename=None should be excluded
    files = [FileObj(None), FileObj('main.py')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 322μs -> 321μs (0.279% faster)

def test_empty_string_filename():
    # File object with filename='' should be excluded
    files = [FileObj(''), FileObj('main.py')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 326μs -> 325μs (0.547% faster)

def test_dotfile_no_extension():
    # Dotfiles like '.gitignore' should be accepted if not in bad list
    files = [FileObj('.gitignore'), FileObj('.env'), FileObj('bad.exe')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 327μs -> 327μs (0.072% slower)

def test_file_with_multiple_dots():
    # Only last extension should be checked
    files = [FileObj('archive.tar.gz'), FileObj('my.module.py'), FileObj('data.backup.tar')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 328μs -> 326μs (0.551% faster)

def test_file_with_uppercase_extension():
    # Extension check should be case-sensitive (so 'EXE' is not filtered, only 'exe')
    files = [FileObj('evil.EXE'), FileObj('bad.exe'), FileObj('good.PY')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 325μs -> 324μs (0.140% faster)

def test_file_with_leading_trailing_spaces():
    # Spaces in filename should not affect extension check
    files = [FileObj(' main.py '), FileObj('bad.exe '), FileObj('archive.zip')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 328μs -> 327μs (0.461% faster)

def test_file_with_no_extension():
    # Files with no dot should be accepted
    files = [FileObj('README'), FileObj('LICENSE'), FileObj('bad')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 326μs -> 326μs (0.065% slower)

def test_file_with_only_dot():
    # File named '.' should be accepted (not filtered)
    files = [FileObj('.'), FileObj('bad.exe')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 324μs -> 327μs (0.717% slower)

def test_file_with_multiple_auto_generated_suffixes():
    # Only exact matches to auto-generated files should be filtered
    files = [FileObj('foo.package-lock.json'), FileObj('bar.yarn.lock'), FileObj('composer.lock'), FileObj('Gemfile.lock')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 325μs -> 324μs (0.299% faster)

def test_file_with_dot_in_folder_name():
    # Extension check should only be on filename, not folder
    files = [FileObj('folder.with.dot/main.py'), FileObj('folder.with.dot/bad.exe')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 327μs -> 326μs (0.380% faster)

def test_file_with_multiple_extensions():
    # Only last extension is checked
    files = [FileObj('foo.tar.gz'), FileObj('bar.backup.zip'), FileObj('baz.txt')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 325μs -> 326μs (0.414% slower)

def test_file_with_extension_in_middle_of_name():
    # Only last extension is checked
    files = [FileObj('foo.exe.txt'), FileObj('bar.zip.md'), FileObj('baz.exe')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 325μs -> 324μs (0.277% faster)

def test_file_with_extension_in_name_but_not_as_extension():
    # Only extension matters, not substring in name
    files = [FileObj('notanexe.py'), FileObj('archivezip.md'), FileObj('good.exe.txt')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 326μs -> 325μs (0.546% faster)

def test_file_with_extension_just_before_auto_generated_suffix():
    # Auto-generated suffix must be exact match
    files = [FileObj('foo.lock.json'), FileObj('bar.lock'), FileObj('Gemfile.lock')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 323μs -> 322μs (0.255% faster)

def test_file_with_extension_only_in_extra_bad_list():
    # By default, extra bad extensions are NOT filtered
    files = [FileObj('image.jpg'), FileObj('photo.png'), FileObj('main.py')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 329μs -> 328μs (0.202% faster)

def test_file_with_extension_in_extra_bad_list_enabled(monkeypatch):
    # Enable extra bad extensions and check filtering
    filter_bad_extensions._use_extra_bad_extensions = True
    files = [FileObj('image.jpg'), FileObj('photo.png'), FileObj('main.py')]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 331μs -> 330μs (0.478% faster)
    filter_bad_extensions._use_extra_bad_extensions = False  # reset

# LARGE SCALE TEST CASES

def test_large_scale_all_good():
    # 1000 files, all with good extensions
    files = [FileObj(f'file{i}.py') for i in range(1000)]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 900μs -> 514μs (74.9% faster)

def test_large_scale_all_bad():
    # 1000 files, all with bad extensions
    files = [FileObj(f'file{i}.exe') for i in range(1000)]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 752μs -> 509μs (47.6% faster)

def test_large_scale_mixed():
    # 500 good, 500 bad
    files = [FileObj(f'good{i}.py') for i in range(500)] + [FileObj(f'bad{i}.exe') for i in range(500)]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 823μs -> 513μs (60.4% faster)

def test_large_scale_auto_generated():
    # 1000 files, all auto-generated
    auto_files = ['package-lock.json', 'yarn.lock', 'composer.lock', 'Gemfile.lock', 'poetry.lock']
    files = [FileObj(auto_files[i % 5]) for i in range(1000)]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 529μs -> 413μs (28.1% faster)

def test_large_scale_no_extension():
    # 1000 files, none have extension
    files = [FileObj(f'file{i}') for i in range(1000)]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 859μs -> 490μs (75.3% faster)

def test_large_scale_some_none_filenames():
    # 1000 files, 100 are None, rest are good
    files = [FileObj(None) for _ in range(100)] + [FileObj(f'file{i}.py') for i in range(900)]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 835μs -> 494μs (68.7% faster)

def test_large_scale_extra_bad_extensions_enabled():
    # 500 files with extra bad extensions, 500 good
    filter_bad_extensions._use_extra_bad_extensions = True
    files = [FileObj(f'image{i}.jpg') for i in range(500)] + [FileObj(f'good{i}.py') for i in range(500)]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 847μs -> 515μs (64.3% faster)
    filter_bad_extensions._use_extra_bad_extensions = False  # reset
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest
from pr_agent.algo.language_handler import filter_bad_extensions


# Helper class for test cases
class FileObj:
    def __init__(self, filename):
        self.filename = filename

# --------------- UNIT TESTS ----------------

# Basic Test Cases
def test_basic_valid_files():
    # Should return all files if none have bad extensions
    files = [FileObj("main.py"), FileObj("test.js"), FileObj("index.html")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 329μs -> 330μs (0.139% slower)

def test_basic_bad_extensions():
    # Should filter out files with bad extensions
    files = [FileObj("main.exe"), FileObj("readme.md"), FileObj("image.png")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 326μs -> 326μs (0.156% faster)

def test_basic_mixed_files():
    # Some files have bad extensions, some do not
    files = [FileObj("main.py"), FileObj("main.exe"), FileObj("archive.zip"), FileObj("notes.txt")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 328μs -> 326μs (0.751% faster)

def test_basic_no_files():
    # Empty list should return empty list
    files = []
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 320μs -> 324μs (1.06% slower)

def test_basic_none_filename():
    # Files with None as filename should be filtered out
    files = [FileObj(None), FileObj("main.py")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 325μs -> 323μs (0.553% faster)

# Edge Test Cases
def test_edge_no_extension():
    # Files with no extension should be considered valid
    files = [FileObj("Makefile"), FileObj("LICENSE")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 325μs -> 322μs (0.872% faster)

def test_edge_hidden_files():
    # Hidden files (starting with dot) with bad extension should be filtered
    files = [FileObj(".env"), FileObj(".DS_Store"), FileObj(".gitignore")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 325μs -> 324μs (0.226% faster)

def test_edge_multiple_dots():
    # Files with multiple dots should check only the last extension
    files = [FileObj("archive.tar.gz"), FileObj("data.backup.bak"), FileObj("main.test.py")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 325μs -> 326μs (0.324% slower)

def test_edge_uppercase_extension():
    # Uppercase extensions should be treated as different (case-sensitive)
    files = [FileObj("photo.JPG"), FileObj("photo.jpg")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 324μs -> 325μs (0.246% slower)

def test_edge_empty_filename():
    # Empty string as filename should be filtered out
    files = [FileObj(""), FileObj("main.py")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 325μs -> 325μs (0.110% faster)

def test_edge_auto_generated_files():
    # Auto-generated files should always be filtered out
    files = [FileObj("package-lock.json"), FileObj("yarn.lock"), FileObj("Gemfile.lock"), FileObj("poetry.lock"), FileObj("composer.lock"), FileObj("main.py")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 326μs -> 327μs (0.361% slower)

def test_edge_dotfile_no_extension():
    # Dotfiles with no extension should be valid
    files = [FileObj(".bashrc"), FileObj(".profile")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 327μs -> 325μs (0.587% faster)

def test_edge_extension_in_middle_of_name():
    # Only the last extension matters
    files = [FileObj("archive.tar.gz"), FileObj("archive.zip.backup"), FileObj("main.py.old")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 325μs -> 326μs (0.190% slower)

def test_edge_bad_extension_in_middle():
    # If bad extension is not the last, file is valid
    files = [FileObj("main.exe.py"), FileObj("main.lock.js")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 327μs -> 327μs (0.185% slower)

def test_edge_extension_not_in_bad_list():
    # Extensions not in bad list should be allowed
    files = [FileObj("main.customext"), FileObj("main.anotherext")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 324μs -> 325μs (0.246% slower)

def test_edge_extension_case_sensitive():
    # Should be case sensitive
    files = [FileObj("main.PY"), FileObj("main.py")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 326μs -> 327μs (0.039% slower)

# Large Scale Test Cases
def test_large_scale_many_files():
    # Test with 1000 files, half with bad extensions
    files = []
    for i in range(500):
        files.append(FileObj(f"file_{i}.py"))  # valid
        files.append(FileObj(f"file_{i}.exe"))  # bad
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 837μs -> 520μs (61.0% faster)

def test_large_scale_all_bad():
    # All files have bad extensions
    files = [FileObj(f"file_{i}.zip") for i in range(1000)]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 929μs -> 514μs (80.6% faster)

def test_large_scale_all_good():
    # All files have good extensions
    files = [FileObj(f"file_{i}.py") for i in range(1000)]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 897μs -> 515μs (74.2% faster)

def test_large_scale_mixed_extensions():
    # Mixed extensions, some good, some bad, some auto-generated
    files = []
    for i in range(333):
        files.append(FileObj(f"file_{i}.py"))    # valid
        files.append(FileObj(f"file_{i}.exe"))   # bad
        files.append(FileObj("package-lock.json"))  # auto-generated
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 709μs -> 482μs (47.0% faster)

def test_large_scale_none_and_empty_filenames():
    # Mix of valid, None, and empty filenames
    files = [FileObj(None) if i % 3 == 0 else FileObj("") if i % 3 == 1 else FileObj(f"file_{i}.py") for i in range(999)]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 538μs -> 408μs (31.9% faster)

# Mutation-sensitive test: changing extension logic should fail this
def test_mutation_sensitive_last_extension_only():
    # If code checks first extension, this will fail
    files = [FileObj("archive.zip.py"), FileObj("archive.py.zip")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 330μs -> 327μs (1.12% faster)

# Mutation-sensitive test: auto-generated files must always be filtered
def test_mutation_sensitive_auto_generated_files():
    files = [FileObj("src/package-lock.json"), FileObj("src/yarn.lock"), FileObj("src/main.py")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 325μs -> 324μs (0.197% faster)

# Mutation-sensitive test: None filename must always be filtered
def test_mutation_sensitive_none_filename():
    files = [FileObj(None), FileObj("main.py")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 325μs -> 325μs (0.060% slower)

# Mutation-sensitive test: empty filename must always be filtered
def test_mutation_sensitive_empty_filename():
    files = [FileObj(""), FileObj("main.py")]
    codeflash_output = filter_bad_extensions(files); result = codeflash_output # 323μs -> 325μs (0.410% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-filter_bad_extensions-mgzh7vjn and push.

Codeflash

**Explanation of Optimizations:**

- **Redundant `get_settings()` calls eliminated:** Avoided multiple calls to `get_settings()` within functions. Instead, retrieved settings once per function and reused, reducing overhead.
- **Inefficient list concatenation replaced:** Avoided repeated list mutation with `+=` on `.extra` bad extensions. Instead, computed the complete bad extensions set once and reused; this avoids each call modifying the list, prevents accidental side effects, and speeds up membership checks.
- **Improved extension membership test:** Converted bad extensions to a `set` for O(1) membership testing, reducing time complexity of `filename.split('.')[-1] not in bad_extensions` from O(n) to O(1).
- **Efficient auto-generated file check:** Used a tuple for `str.endswith()` to check all forbidden filenames in one call, improving speed and code clarity.

These changes preserve behavioral correctness while reducing runtime for large file lists or frequent function calls.

---
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 20, 2025 18:36
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants