Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jsoup like Element, Document classes #26

Open
Vaibhav2002 opened this issue Oct 13, 2023 · 6 comments
Open

Jsoup like Element, Document classes #26

Vaibhav2002 opened this issue Oct 13, 2023 · 6 comments

Comments

@Vaibhav2002
Copy link

Does Ksoup have features like Document, Element, etc classes, like Jsoup has?

@Queatz
Copy link

Queatz commented Apr 20, 2024

Also wondering. Was thinking to use this library to extract og:* metadata

@vanniktech
Copy link

All of this is supported by https://github.com/fleeksoft/ksoup - @MohamedRejeb are there any plans to work together on Ksoup support for Kotlin Multiplatform? To be honest, I don't care which library I am using but the one from @fleeksoft seems superior as it really has everything from Jsoup, including thins that I've asked:

#17
#18

In addition it has the benefit that the API names are the same, so you can just google whatever with jsoup and adjust the syntax to Kotlin. The only drawback is that your library seems to be far more active though.


I've switched to the other library for now since I also need Element, Document etc for full parsing of HTML.

I believe that other ksoup library also has support for:

#13
#5
#4

It would be a shame to do the work twice. Also it's rather confusing that there are two libraries which are named exactly the same and seem to be doing the same from the outside.

@MohamedRejeb
Copy link
Owner

Hi,
You are right @vanniktech . The problem is that the other library is backed by a company. I will try to reach them and see if we can do a collaboration.

@westnordost
Copy link

westnordost commented May 27, 2024

I found this ticket because I was confused why there are two KSoups out there and no explanation what is the difference between the two.

Looking at @fleeksoft's build.gradle.kts, it pulls in a number of dependencies I wouldn't expect from a simple HTML parser, such as network access, date-time parsing, file access and support for unicode code points. Given that JSoup actually features parsing a HTML directly from a web page, maybe not surprising for a faithful port. It's JAR size for JVM is additionally over 600kB.

@MohamedRejeb's Ksoup has no external dependencies and it's JAR size for JVM is just over 60kB. Great! That's what I need for my project - a simple HTML DOM parser. But that is maybe not what people looking for a port of Jsoup to KMP are looking for. They might be looking for a port that offers the same features.

If my assessment of your library as a simple HTML DOM parser and nothing else is correct, @MohamedRejeb , how about you renamed your library accordingly to firstly resolve confusion which is a faithful (probably - didn't look closely at fleeksoft's lib yet) port of Jsoup and secondly do expectation management: If people don't assume this library is anything else or more than a HTML parser, i.e. has all the features Jsoup has, you won't get flooded with feature requests to add this or that because Jsoup has it.
Finally, since the name would be different, there is no expectation that the API would be similar. For example, the "handler" stuff is quite Java-typical. In Kotlin one would probably rather emit a Sequence of entities.

@westnordost
Copy link

(Or in an ideal world, there'd be one library that just does the basic HTML parsing, yours, and then fleeksoft's Ksoup port would use this as a dependency and add all that stuff they need to get feature-parity with Jsoup. But such cooperations usually don't work except if this happens within the same organization.)

@vanniktech
Copy link

Looking at @fleeksoft's build.gradle.kts, it pulls in a number of dependencies I wouldn't expect from a simple HTML parser, such as network access, date-time parsing, file access and support for unicode code points. Given that JSoup actually features parsing a HTML directly from a web page, maybe not surprising for a faithful port. It's JAR size for JVM is additionally over 600kB.

I was also suprised by this but it does make sense: fleeksoft/ksoup#30 - Java has all the APIs built in. Kotlin Multiplatform does not. I think it would make sense to maybe provide extension modules for file support if one wants it. In my case I do use all those transitive libraries anyways so it does not matter for me.

@MohamedRejeb's Ksoup has no external dependencies and it's JAR size for JVM is just over 60kB. Great! That's what I need for my project - a simple HTML DOM parser. But that is maybe not what people looking for a port of Jsoup to KMP are looking for. They might be looking for a port that offers the same features.

In the beginning I also only needed a simple HTML DOM parser but then I had to use the full features of Jsoup. Also if you're on Android, R8 will remove everything that's not needed.

If my assessment of your library as a simple HTML DOM parser and nothing else is correct, @MohamedRejeb , how about you renamed your library accordingly to firstly resolve confusion which is a faithful (probably - didn't look closely at fleeksoft's lib yet) port of Jsoup and secondly do expectation management: If people don't assume this library is anything else or more than a HTML parser, i.e. has all the features Jsoup has, you won't get flooded with feature requests to add this or that because Jsoup has it.

I think this library was here first and only later that other ksoup library was created. But renaming sounds good to avoid confusion if there is no collaboration wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants