Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about suitability to Web scraping. #78

Open
deabreu opened this issue Aug 24, 2021 · 0 comments
Open

Question about suitability to Web scraping. #78

deabreu opened this issue Aug 24, 2021 · 0 comments

Comments

@deabreu
Copy link

deabreu commented Aug 24, 2021

Hello all. Please, forgive me if I'm making a wrong move posting this question here.

I'm looking for an alternative in Scala for Scrapy for parsing HTML documents for Web Scraping. I've been trying to build this alternative using Jsoup, but as it is a pure Java library, the conversion for Scala every time made the development a little counterintuitive and I'd like to have a more Functional approach.

I've come across Pine, as such an approach but the project seems to be more focused on building the rendering than creating a data structure model from an existing project, which would be my main focus. If that is incorrect, please help me clarify this impression.

Given that thought, I ask you to answer these questions about the project, or the documentation.

  1. Can Pine parse any existing HTML5 compliant document into a tree-like hierarchical structure? And can this structure be queried?
  2. Can Pine help me parse Javascript code for dynamic sites? If so, could you point me out an example of how to start doing it, please?
  3. If not, could you point me out some possible way to work around this limitation, please?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant