-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: PoC implementing Wkt parser using Pidgin Parser Combinator lib #121
base: develop
Are you sure you want to change the base?
DRAFT: PoC implementing Wkt parser using Pidgin Parser Combinator lib #121
Conversation
I like the general idea of this pull request. I saw that pidgin library has codegen. So is this only an development dependency and at runtime pigdin isn‘t needed? |
@inforithmics Don't know nothing about codegen. I'm using an older version of Pidgin (Version 2.5.0) that still supports netstandard 2.0. This older Pidgin version assembly is needed runtime AFAIK. |
I've looked this up a little bit. Without doing something unreasonable like making it possible to bundle (up to) the entire runtime library as a source generator, parser combinators like Pidgin should theoretically always require a runtime dependency. Parser generators like ANTLR already generate source code for their target language, so they don't technically need a runtime library, though I can't imagine (or find) a practically useful one that doesn't.
That depends.
I think so. My biggest concern is that we must use an older version of the library in order to support With that said, I don't think that this concern is a dealbreaker. It really isn't reasonable in 2024 to implement parsers the hard way like this project did at first, and Pidgin 2.5.0 looks like one of the least intrusive ways to do that, so I think it's a good balance. I would have PERSONALLY chosen ANTLR if I were still contributing regularly (because of my personal familiarity, because the latest versions of the runtime on both versions still support |
Runtime dependency is okay and I had a look a the project website so it seems well maintained. |
Thanks for the clarification concerning the specification. I'm gone switch now to the right one ;-) Hopefully I can reuse some parts. I looked briefly at ANTLR and have some concerns about the Java part of the generator. And it looks that runtime you also are going to need some assemblies. The grammar is different than (E)BNF which is another thing to get my head around. Parer combinator Lib is sort of in between Regular expressions and a full blown Parser generator like ANTLR. Hopefully we don't need the full blown....but I could be wrong. |
What I especially like about pidgin is that it is AOT compatible which is important for mobile and Wasm solutions. It's a later version, maybe this can be supported by using multitargeting, but this is an improvement for later. |
- WktTextReader with underlying WktParser now working - Followed spec WKT 1 and looked at existing Unit Tests. - WKT now has its own AST for multiple reasons. - Still need to integrate the new WKT AST with rest of PROJ4Net.
Feedback is welcome! |
- Discovered Pidgin Parsers Map(...) function which makes WktParser a lot more readable! - Made some performance improvements for ParseAllWKTs taken 4 seconds down to under 1 second. - Cleaned some obsolete code.
WktToProjConverter is an implementation example of this interface. ParseAllWKTs unittest succeeds.
IWktObject now has a Traverse method that accepts a IWktTraverseHandler. WktToProjConverter is now a working example of such a TraverseHandler. Code to read, parse and convert now looks like: // read
using var sr01 = new StringReader(wkt.Wkt);
using var wktReader01 = new WktTextReader(sr01);
// parse
var result01 = wktReader01.ReadToEnd();
// convert
var converter01 = new WktToProjConverter();
converter01.Convert(result01.Value); TODO: Wkt output handing, semantic checks, and replacing old wkt parser...? Again feedback is welcome! |
Also reparsing in unit test ParseAllWKTs from SRID csv => Works!
The Output formatting is now also working: First Text from SRID CSV:
Can now be formatted to:
DefaultWktOutputFormatter with default constructor settings leaves the output compact on one line. // Parse in from CSV...
using var sr = new StringReader(wkt.Wkt);
using var reader = new WktTextReader(sr);
var result01 = reader.ReadToEnd();
var cs01 = result01.Value;
// Generate WKT string from WktObject...
var formatter = new DefaultWktOutputFormatter(
newline: Environment.NewLine,
leftDelimiter: '(', rightDelimiter: ')',
indent: "\t", extraWhitespace: " ");
string wkt01 = cs01.ToString(formatter); Again feed back is welcome.... |
- Fixed multiple errors - Fixed support for FittedCoordinateSystem - All unit tests except one now working. The one needs partial parsing which is not supported yet.
I'm now ready with this WktParser 1.0 replacement. All unit tests, except one, run fine using the new WktTextReader/WktParser under CreateFromWkt. When this work is coming through I will look into Wkt 2.0 support. Again feedback and/or reviews are welcome! |
I sleept a night over it and I'm not happy with a few things:
So I'm thinking of adding a dedicated ProjNet.IO.WKT project to this PR with the new WKT stuff. The WktParser will than write to an IWktBuilder interface with its own implementation in Proj4Net: WktToProjTextReader. Again feedback and Idea's are welcome! |
Concerning the Performance: I think there a lot of Performance improvements in newer Pidgin Versions. Because of AOT Compatibility I think MultiTargeting needs to be added so in essance something like this should be added in the csproj
So this means for .netstandard2.0 it is using pidgin 2.5 and for .Net 8 it is using pidgin 3.2.3. The Performance Tests should be done on net8.0 Framework project. |
Another Topic: Have tried parsing these https://github.com/maptiler/epsg.io/tree/master/extra_codes_proj4_4.8.0.2 |
I like the idea of having a new and old/legacy implementation of the wktparser so if something doesn‘t work the old implementation can be used If it is in the same project or in a different one I don‘t have a strong opinion on that. But normally it is good practice to have its own nuget package per project. |
Thanks for your feedback and help! When I'm done with the basic design of the new WktParser I will look into this. |
Is this something the current (old) parser also supports? I noticed the Wkt Extension having the same proj content. But this content is not supported in the new parser... |
Yes thats also something that I had in my head. Maybe leaving the implementation of CreateFromWKT alone and only create a new WktToProjTextReader using the new WktParser. |
I know that you're primarily looking for feedback on the more substantive parts of this PR; I don't have the capacity to review it very much right now, but I did start to see some e-mail notifications for comments in this thread that I wanted to address, so that's what pulled me back here.
If we can get away with it, I'd prefer not to. Is there any breaking change in higher versions of the Pidgin library that would stop this package from working as-is if a
Because there seems to be revived interest in using multitargeting in order to benefit from the newer version of Pidgin, I'd like to take a moment to respond directly to the following points:
In the .NET world, there are two viable ways to use an ANTLR parser generator: the official ANTLR v4 project which does require a JRE at build-time (not at runtime), and Sam Harwell's antlr4cs project which has a Java-free mode. I must concede that I would prefer using the official ANTLR v4 project these days, which does require Java. I'm assuming that your concerns are similar to mine: it sucks to require a JRE to be installed in order to build a .NET application. I've seen worse, but it still sucks, and it can limit this project's accessibility to new contributors.
Whereas all newer versions of Pidgin have already switched to targeting only
I see two things here:
I want to re-emphasize that if you are familiar with Pidgin, then it's probably not worth redoing everything just to use my favorite tool. Again, I wasn't going to respond to these points until multitargeting came up as a possible consequence of using Pidgin (which is the context where I have brought up ANTLR before), since that's when it starts getting relevant. |
Thanks @airbreather for your feedback! You are totaly right: Lets get the right tool for the job and not the other way arround! For me this PR was and is an exercise to proof that a parser combinator lib like Pidgin CAN be used to implement WKT. And for now it looks like it can. About ANTLR: I'm still not convinced its the right tool for this job. Using the JRE is one thing but its so totally different than Parser Combinators which, indeed, I like very much. My reason for choosing Parser combinator libs above something like ANTLR is because you are already working in the target programming language where the parsing happends. If Superpower lib, or any other lib, is an option let me know. Else I will look into ANTLR when I have some time. EDIT:
|
Looking at the description again, I didn't realize that this PR had been so substantially modified since my first comment on this thread. Is it fair to say that this PR now adds a second implementation of the same specification that the project has already (ostensibly) implemented for a very long time? If so, why? I would at least want to see it linked to one specific problem that it solves, so that I can feel more OK about the way that this is going. Even if such problem(s) on their own probably wouldn't justify such a major rewrite, I can actually live with it because I expect that using an actual parsing library will be easier for normal people to maintain than everything in this directory, and so it would be a huge benefit to be able to say "see what happens when we stop trying to implement complicated parsing on our own? these issues automatically go away". But if we can't even point to a single issue that it solves, then it should probably not go forward as currently designed. Adding a second way to achieve the exact same goal means more code that must be maintained (even if we deprecate one of the two), and it becomes much more difficult to justify adding a new dependency on an older version of a library that someone might incidentally want to use a newer version of: it turns the question of breaking changes from my above comment into a new concern that is not counterbalanced by anything beneficial. This isn't all bad news, I think: IIRC, the existing WKT parser did have some sort of issues, though I can't remember what they were right now, so maybe they were resolved... worst-case, even if we can't justify a second implementation of the same spec, I fully support implementing WKT-CRS using something more standard.
It's Apache-2.0 licensed, and it supports I try to say this every time, so to make sure I say it again... as someone without much time to spend on this project (see #99), I can't justify standing in the way of improvements or additions just because they're not done in my favorite way. I can only justify blocking something because it makes something worse, or has a high risk or making something worse. So if you can make a substantial improvement to something using Superpower, then go for it.
I understand that you prefer parser combinators. This is fine, and I do not think that the approach is fatally flawed or anything. They are simple to use for what they bring to the table, extremely straightforward, and they automatically benefit tremendously from IntelliSense. Compared to working with parser generators, combinator libraries like this will tend to minimize the magical black-box bits that you "just have to trust" do the right thing. And compared to ANTLR in particular, these sorts of things will be much easier to integrate: you just add a package dependency like any other, no need to punch ANTLR-shaped holes in your toolchain to squeeze it in. That being said, I believe that this is more a matter of individual preference rather than a "should I use a screwdriver or a hammer to install a screw?" style issue. Putting my personal preferences aside, I think it's easy to say that ANTLR is a viable option based on its merits (at the very least), and it has some advantages:
I'm definitely spending too much time on these comments. The short version is that I think ANTLR is absolutely viable and has some advantages and disadvantages relative to any parser combinator library, and I also think that parser combinator libraries are perfectly viable as well, so the main decision point should be to use what you're most comfortable with.
I expect that to be irrelevant for this project. |
Yes the PR description is maybe bad. I can convert it to a draft and leave it lying arround as a Pidgin PoC?
This PR became indeed to much work and should probably seen as a proof of concept. The whole Wkt(Base)Object tree is nice for separation of concerns but too much for this library which already has its components.
Yes you are right about pointing out a problem to solve: They are only developer wise and maybe not directly for users of this lib. But improving code quality and better support of WKT 2+ can become a indirect plus feature and not a problem solver.
To me two implementations are only temporarly: The best implementations wins and remains.
I am hesitating to jump onto Superpower right away. Some Pidgin PoC work todo before.
Again I will convert this PR to a draft and rename it to a PoC. Then will finish this PoC PR with code I still have lying arround. After that I will look into Superpower: Should not take to much time because It looks a lot like Pidgin code => New PR. After that I will try to dive into ANTLR in another PoC => Another PR I'm willing todo so because I like parsing and all about it. And I'm willing to help out. Thanks for this discussion and your valuable time! |
Saved the last code changes for this PoC. Unfortunately the outcome is not satisfying: So I'm going to dive into ANTLR first. Because I think Superpower won't be faster. |
The corresponding Unit tests and beneath comments show how to use this parser.
I'm basing the parser on this document: Chapter 9 - Wkt 1.2.1
(Thanks to @airbreather)
Plus extras to get the Unit tests running. (E.g. ToWgs84, FittedCS and more are not in this spec)
Features: