-
Notifications
You must be signed in to change notification settings - Fork 3
White list / black list websites, robots.txt pre-sets #5
Comments
Perhaps it would be better to chat and discuss the development in "Discussions", and use this section to solve existing (already implemented :)) problems, as well as consider user requests. |
Well, for this subject have implemented new feature that relates to the
In few words, we can append extra robots.txt rules in to the For the white-blacklist needs we don't need the any of new features implementation, because can simply disable specific domain for it pages crawling and indexing in the And finally, to close this subject, I have created database configuration preset, where everyone can contribute the propositions. https://github.com/YGGverse/YGGo/tree/main/database/yggdrasil |
just for a note, those data sets are depending of crawler configuration so have moved these variables to the manifest API, where each the application able to grab the data match to it specific requirements I work on the distributed ecosystem, so for right now it's This option could be enabled by node owner with |
So, trackers with external seeders is shit inside the network
Nice start..
I mean this subject for the websites we need to crawl and some maybe a mirrors we need to block or limit by the
crawlPageLimit
/CRAWL_HOST_DEFAULT_PAGES_LIMIT
Ideas here, just few relevant relations
#1 (comment)
And I would to ask, do we need to enable the GitHub Discussions page, or do Issues to resolve, not talk.
The text was updated successfully, but these errors were encountered: