fix: remove dead code from utils and adaptive_crawler by RajanChavada · Pull Request #2042 · unclecode/crawl4ai

RajanChavada · 2026-06-29T15:38:02Z

Summary

Please include a summary of the change and/or which issues are fixed.

Removes unreachable and abandoned code that accumulated over time. No behaviour change.

List of files changed and why

crawl4ai/adaptive_crawler copy.py: editor artifact committed by mistake; byte-for-byte duplicate of
adaptive_crawler.py, not imported anywhere. Deleted.
crawl4ai/utils.py: two dead normalize_url variants removed:
- The first normalize_url definition was silently shadowed by the extended definition ~20 lines below it.
  Python last-write wins, so it was never callable.
- normalize_url_tmp had zero callers outside utils.py itself and reimplemented what urllib.parse.urljoin already
  does correctly.

How Has This Been Tested?

Existing test suite passes (pytest). No callers of removed code exist -> confirmed by grep across the full codebase before removal. extract_xml_data_legacy (also "legacy"-named) was left in place because tests/regression/test_reg_utils.py uses it.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added/updated unit tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

adaptive_crawler copy.py was an uncommitted editor artifact that ended up tracked in the repo. It is byte-for-byte identical to adaptive_crawler.py and is not imported anywhere.

Two unreachable functions in utils.py: - The first `normalize_url` (plain urljoin wrapper) was silently shadowed by the extended `normalize_url` defined ~20 lines later. Python last-write wins, so the first definition was never callable. - `normalize_url_tmp` was a hand-rolled URL joiner (string split on "/") with no callers outside utils.py itself. `urllib.parse.urljoin` already covers this correctly.

RajanChavada · 2026-06-29T15:40:07Z

Requesting a review on this PR (tagging @unclecode) as the lead maintainer :)

ntohidi

Reviewing the PR...

ntohidi · 2026-07-01T08:29:50Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Notes:

Confirmed the shadowed normalize_url (first definition) was never callable due to Python's last-write-wins behavior. The canonical version with keyword args is preserved.
Confirmed normalize_url_tmp has zero callers across the codebase.
adaptive_crawler copy.py is not importable (space in filename) and was never referenced. Note: the PR description says "byte-for-byte duplicate" but it is actually an older, diverged snapshot missing several fields added later. The deletion is still correct -- just the justification could be more precise.

@RajanChavada Thanks for your contribution :)

RajanChavada added 2 commits June 29, 2026 11:34

chore: remove accidental copy of adaptive_crawler

6a181da

adaptive_crawler copy.py was an uncommitted editor artifact that ended up tracked in the repo. It is byte-for-byte identical to adaptive_crawler.py and is not imported anywhere.

ntohidi changed the base branch from main to develop July 1, 2026 08:15

ntohidi reviewed Jul 1, 2026

View reviewed changes

ntohidi merged commit 9b5a090 into unclecode:develop Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: remove dead code from utils and adaptive_crawler#2042

fix: remove dead code from utils and adaptive_crawler#2042
ntohidi merged 2 commits into
unclecode:developfrom
RajanChavada:bugfix/remove-dead-code

RajanChavada commented Jun 29, 2026

Uh oh!

RajanChavada commented Jun 29, 2026

Uh oh!

ntohidi left a comment •

edited

Loading

Uh oh!

ntohidi commented Jul 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

RajanChavada commented Jun 29, 2026

Summary

List of files changed and why

How Has This Been Tested?

Checklist:

Uh oh!

RajanChavada commented Jun 29, 2026

Uh oh!

ntohidi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ntohidi commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ntohidi left a comment •

edited

Loading

ntohidi commented Jul 1, 2026 •

edited

Loading