Replies: 1 comment
-
Hi, glad to hear that, thanks for the nice words 🙂. Regarding your question:
I hope this helps and good luck with your project. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm planning to build a broken link checker on top of crawlee-python, similar in scope to lychee and htmltest. For now I just need to detect broken internal links, so the crawl should contain all of the state needed. By "internal links" I mean links within the page or to other pages within the same domain, including anchor fragments.
From reading through the docs:
BeautifulSoupCrawler
seems like a great starting point since it already parses the HTML and makes it easy to extract links.KeyValueStore
seems like the right tool to store things like paths to detect internal broken links, and element IDs to detect broken anchor fragments.Does crawlee already have a mechanism to detect internal broken links, or even broken anchor fragments? I'm still ramping up on crawlee, so I don't want to reinvent the wheel here.
Are there any other components or prior art I should be aware of?
PS: Thank you for releasing this as open source! From my experience so far,
crawlee-python
feels like a really powerful tool and it was extremely easy to get started with it.Beta Was this translation helpful? Give feedback.
All reactions