You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@panthony Crawler-Delay is not part of the standard, so there is no way we can tell the number is seconds, minutes, hours or days.
Probably providing robots.txt should be the direct solution to your use case: #192
@yujiosaka You are right, this is not part of the standard.
But it looks like everyone agree that it is expected to be as a number of seconds and if the crawler may not obey it out of the box we should have some way to enforce it.
It would be sad to be banned from accessing a site because we did not obey their rules :)
I do not quite see how providing a robots.txt could be a solution?
Or you meant like I could configure the delay of the crawler according to the robots.txt I provide?
What is the current behavior?
The
Crawl-Delay
is ignored.What is the expected behavior?
The
Crawl-Delay
should be honored, it can be retrieved usinggetCrawlDelay()
on the robots parser.What is the motivation / use case for changing the behavior?
A bot is bound to respect all the directives of the robots.txt
The text was updated successfully, but these errors were encountered: