Skip to content

fix: remove dead code from utils and adaptive_crawler#2042

Merged
ntohidi merged 2 commits into
unclecode:developfrom
RajanChavada:bugfix/remove-dead-code
Jul 1, 2026
Merged

fix: remove dead code from utils and adaptive_crawler#2042
ntohidi merged 2 commits into
unclecode:developfrom
RajanChavada:bugfix/remove-dead-code

Conversation

@RajanChavada

Copy link
Copy Markdown

Summary

Please include a summary of the change and/or which issues are fixed.

Removes unreachable and abandoned code that accumulated over time. No behaviour change.

List of files changed and why

  • crawl4ai/adaptive_crawler copy.py: editor artifact committed by mistake; byte-for-byte duplicate of
    adaptive_crawler.py, not imported anywhere. Deleted.
  • crawl4ai/utils.py: two dead normalize_url variants removed:
    • The first normalize_url definition was silently shadowed by the extended definition ~20 lines below it.
      Python last-write wins, so it was never callable.
    • normalize_url_tmp had zero callers outside utils.py itself and reimplemented what urllib.parse.urljoin already
      does correctly.

How Has This Been Tested?

Existing test suite passes (pytest). No callers of removed code exist -> confirmed by grep across the full codebase before removal. extract_xml_data_legacy (also "legacy"-named) was left in place because tests/regression/test_reg_utils.py uses it.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

adaptive_crawler copy.py was an uncommitted editor artifact that ended up
tracked in the repo. It is byte-for-byte identical to adaptive_crawler.py
and is not imported anywhere.
Two unreachable functions in utils.py:

- The first `normalize_url` (plain urljoin wrapper) was silently shadowed
  by the extended `normalize_url` defined ~20 lines later. Python last-write
  wins, so the first definition was never callable.

- `normalize_url_tmp` was a hand-rolled URL joiner (string split on "/")
  with no callers outside utils.py itself. `urllib.parse.urljoin` already
  covers this correctly.
@RajanChavada

Copy link
Copy Markdown
Author

Requesting a review on this PR (tagging @unclecode) as the lead maintainer :)

@ntohidi ntohidi changed the base branch from main to develop July 1, 2026 08:15

@ntohidi ntohidi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing the PR...

@ntohidi

ntohidi commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Notes:

  • Confirmed the shadowed normalize_url (first definition) was never callable due to Python's last-write-wins behavior. The canonical version with keyword args is preserved.
  • Confirmed normalize_url_tmp has zero callers across the codebase.
  • adaptive_crawler copy.py is not importable (space in filename) and was never referenced. Note: the PR description says "byte-for-byte duplicate" but it is actually an older, diverged snapshot missing several fields added later. The deletion is still correct -- just the justification could be more precise.

@RajanChavada Thanks for your contribution :)

@ntohidi ntohidi merged commit 9b5a090 into unclecode:develop Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants