-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duplicated url are crawled twice #302
Comments
The reason might lie in helper.js: static generateKey(options) {
const json = JSON.stringify(pick(options, PICKED_OPTION_FIELDS), Helper.jsonStableReplacer);
return Helper.hash(json).substring(0, MAX_KEY_LENGTH);
} Uniqueness is assessed from a hash generated on the result of I'm looking for opinions. See https://github.com/substack/json-stable-stringify |
Same as #299 |
headless 模式下一直报302 |
I found two reasons:
|
is anyone consider creating a PR? |
Just posting here hoping this would help someone. This is true it crawls duplicate URLs when concurrency > 1. So here is what I did.
|
What is the current behavior?
Duplicated urls are not skipped. The same url is crawled twice.
If the current behavior is a bug, please provide the steps to reproduce
What is the expected behavior?
Crawled urls should be skipped even if they come from the
queue
.Please tell us about your environment:
The text was updated successfully, but these errors were encountered: