Even if seeded with different domains, crawler4j crawls one domain at a time #450

theomails · 2020-08-28T23:23:05Z

Hi,
Even if seeded with 5 domains or so, with numCrawlers set as 10, crawler4j crawls only one domain at a time. Given that politeness delay is about 1 to 5 seconds, thats only a few dozen pages per minute. Whereas, if it crawled different domains in parallel, pages per minute would go up rapidly.

Is there a reason why a single CrawlController doesn't crawl multiple domains in parallel?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Even if seeded with different domains, crawler4j crawls one domain at a time #450

Even if seeded with different domains, crawler4j crawls one domain at a time #450

theomails commented Aug 28, 2020 •

edited

Loading

Even if seeded with different domains, crawler4j crawls one domain at a time #450

Even if seeded with different domains, crawler4j crawls one domain at a time #450

Comments

theomails commented Aug 28, 2020 • edited Loading

theomails commented Aug 28, 2020 •

edited

Loading