Skip to content

Conversation

@Patai5
Copy link
Contributor

@Patai5 Patai5 commented Oct 3, 2025

Fixes the adaptive playwright crawler's issue of not using the new crawl depth implementation of enqueueLinks

Discovered in WCC here:


  • Moves the context's enqueueLinks wrapper into a new method of the basic crawler - _crawlingContextEnqueueLinksWrapper
    • this is then simply reused in adaptive playwright crawlers wrapper 👍

Up for discussion:

  • I also did the same thing for the addRequests wrapper - _crawlingContextAddRequestsGenerator
    Only after finishing up my tests, I realized that this one is not actually needed :D Let me know if you think we should keep it or move it back :) It seems like a nice standalone function, but maybe we should keep it as private and not protected 🤔
  • Also, what do you think about testing protected methods? Because I think it makes sense if someone can reuse them. Like in this case, _crawlingContextEnqueueLinksWrapper?

@Patai5 Patai5 requested review from barjin and janbuchar October 3, 2025 20:04
@janbuchar
Copy link
Contributor

Note - this will be fun to merge with #3119

@Patai5 Patai5 force-pushed the fix/adaptive-playwright-crawler-crawl-depth branch from 498ec43 to 0a94f70 Compare October 23, 2025 08:48
@Patai5 Patai5 changed the base branch from master to refactor/avoid-lets-from-timeout-and-retry October 23, 2025 08:48
@Patai5 Patai5 force-pushed the fix/adaptive-playwright-crawler-crawl-depth branch from 0a94f70 to 31edb70 Compare October 23, 2025 14:44
@Patai5
Copy link
Contributor Author

Patai5 commented Oct 23, 2025

I've fixed all of the remarks under this one:
31edb70


Force pushed for a rebase onto:

@Patai5 Patai5 closed this in #3206 Oct 24, 2025
Patai5 added a commit that referenced this pull request Oct 24, 2025
Fixes #3188 (comment)
in a separate PR for clean history and review

Ensures that the underlying retries are awaited properly, so that
1. the main promise does not resolve early, which could've caused race conditions on usage of this method
2. they do not lead to orphan non-awaited promises, that if they throw an error, it propagates all the way to the root level of runtime, causing it to crash
@janbuchar janbuchar reopened this Oct 24, 2025
@Patai5 Patai5 changed the base branch from refactor/avoid-lets-from-timeout-and-retry to master October 24, 2025 09:59
@Patai5 Patai5 force-pushed the fix/adaptive-playwright-crawler-crawl-depth branch from 31edb70 to 3185919 Compare October 24, 2025 10:05
@Patai5
Copy link
Contributor Author

Patai5 commented Oct 24, 2025

Accidentally closed this PR with fixes keyword and a link to a comment inside this PR inside #3206 🙏😭

Anyway, rebased and force pushed onto master with changes from #3206 now ✅

Copy link
Contributor

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, just a few comments

@Patai5
Copy link
Contributor Author

Patai5 commented Oct 27, 2025

Let me know if you consider everything as resolved 👍 Otherwise, I consider this done :)

@janbuchar janbuchar self-requested a review October 27, 2025 13:02
@B4nan B4nan changed the title fix(adaptive-playwright-crawler): use shared enqueue links wrapper fix: use shared enqueue links wrapper in AdaptivePlaywrightCrawler Oct 28, 2025
Copy link
Contributor

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Patai5 Patai5 merged commit 9569d19 into master Nov 5, 2025
12 checks passed
@Patai5 Patai5 deleted the fix/adaptive-playwright-crawler-crawl-depth branch November 5, 2025 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants