ANTLR4 WebAssembly target #4362
Replies: 19 comments 19 replies
-
Hi Mike, since WebAssembly is not per se a 'language', it'd be great to clarify what integration patterns you have in mind ? |
Beta Was this translation helpful? Give feedback.
-
Well, the textual form can be considered to be an own language, but we don't work with that directly, right? It's always the binary form (*.wasm) and currently I only follow the path to use that in a JS/TS environment. The generated wrapper JS file makes that easy and provides a lot of tooling that helps to work with the wasm binary. I have not thought yet about other use cases, just want to finish one for now :-) But if would be possible to use the assembly in other languages then this would be a tremendous simplification of the target landscape we have now in ANTLR4. A quick search showed me that there's python-wasmer to run webassembly in Python. Imagine we would only need to write the wrappers and have to maintain just one core runtime (C++, beside Java as reference implementation)! That would be awesome. |
Beta Was this translation helpful? Give feedback.
-
Yes there are WebAsm wrappers for JS, Python, C++, Java, C#... Some time ago I started exploring a gradual migration of the JS runtime, using AssemblyScript to generate the WebAsm byte code, but that proved to be counter-performant i.e. the cost of serializing/deserializing was monstrous. Then I looked at using AssemblyScript as a full-fledged target and that was pretty disappointing too, since AssemblyScript looks like TS, relies on TS tools, but you only find at compile time that many required constructs are not supported... In my experience it's close to unusable for a big project. IIRC, the major hurdle I bumped into at the time was the lack of inheritance. WebASM is evolving slowly, and some RFCs aim to fill fundamental gaps:
I suspect this will take a couple of years, hence my thinking that it would be more reasonable to target a unified antlr runtime for a future antlr. But although I didn't at all explore C++, I believe the above gaps are filled by the available C++ wrapper. So maybe starting with C++ would be a more rapidly achievable small win, and provide a good basis for other targets. The general idea would be to convert the generated lexer and parser to wasm using existing tools. |
Beta Was this translation helpful? Give feedback.
-
AssemblyScript was my first approach, but it misses too many important core aspects, so I gave up on that. For wasm the situation is completely different, because it's not the language which uses the code that is converted, but the library being used. I tested first with simple things like adding an interval to an interval set 10 million times and found that this is slower using wasm, compared to native JS. Which shows it is very important not to cross boundaries for hot paths. But that's a perfect scenario for parsing input. We only pass in text and get back a parse tree + diagnostics. So I expect that to be much faster. What's also critical for adoption is the tooling in the consuming language. The generated JS wrapper has a lot of handling already built-in, like accessing the underlying wasm memory directly, conversion of primitive types (like strings), checks for wrong parameters, duplicate type names and many more. I wouldn't want to write that by hand. GC is a different matter, since C++ doesn't have that. But by using smart pointers that can be mitigated. This is something I still have to check once there's a first working version. Inheritance on the other hand is pretty well supported. With some glue code I can extend a C++ class in JS/TS (as we need that for for generated lexer + parser). Not sure how this is handled in other user land languages (like python). To see if a unified ANTLR4 runtime is feasible will require some investigation into the individual consuming languages and their wasm support. But to make it clear(er): I don't want to compile the generated files to wasm! All I want is to have a wasm based runtime, which is then consumed by the generate files. They are not time critical, but having to compile them to wasm would require much more extra work and additional build tools. My vision is to just publish the wasm + target language wrapper and continue using the current approach of generating the parser/lexer files. |
Beta Was this translation helpful? Give feedback.
-
Looks like we started from the same simple stuff (interval sets) and reached the same conclusion... |
Beta Was this translation helpful? Give feedback.
-
I got lexing working. See here: mike-lischke/antlr4wasm#3 |
Beta Was this translation helpful? Give feedback.
-
The first version of the wasm runtime works now and I collected a few performance numbers.
Looks pretty good for input that doesn't use left recursion much. However, the memory household is currently mostly uncontrolled. Need to find a way to manage it. |
Beta Was this translation helpful? Give feedback.
-
@KvanTTT Would you be interested in changing your benchmarks to use a really heavy grammar (MySQL) and do the same execution like I did for the results above? I guess that would be a pretty good performance overview for all available targets. Though I guess we have to leave out PHP then. It will probably take days to finish. |
Beta Was this translation helpful? Give feedback.
-
It does, please check it out.Envoyé de mon iPhoneLe 26 août 2023 à 15:35, Mike Lischke ***@***.***> a écrit :
I used the NPM version, so might not have got the latest fixes. I can certainly test with the latest code if that has improvements!
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
What changed is that we’re no longer letting webpack convert for es5. I couldn’t believe it myself when a guy mentioned the numbers …Envoyé de mon iPhoneLe 27 août 2023 à 12:44, Mike Lischke ***@***.***> a écrit :
I checked the git history for the source files for the JS runtime and found no significant changes since the last release 3 months ago, so how can the code in the repo be so much faster compared to the NPM module? That makes no sense.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I learned 20 years ago that runtime based languages force us to think differently about optimization because you need to factor in the speed of the runtime itself. In this case it seems that an ES6 class is much much faster than a prototype based ES5 compatible class. It makes sense because a manually built ‘vtable’ might not be as easy to optimize by the v8 engine than an ES6 one which is deemed immutable (appreciate JS let’s you change things manually but I wouldn’t be surprised if doing so broke the performance).Envoyé de mon iPhoneLe 27 août 2023 à 12:44, Mike Lischke ***@***.***> a écrit :
I checked the git history for the source files for the JS runtime and found no significant changes since the last release 3 months ago, so how can the code in the repo be so much faster compared to the NPM module? That makes no sense.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Don’t use the MySQL grammar in the contributed grammars. In my opinion we
should remove it as it is essentially unusable. It will stress the system
though, I’ll say that ;)
…On Sat, Aug 26, 2023 at 20:26 Mike Lischke ***@***.***> wrote:
@KvanTTT <https://github.com/KvanTTT> Would you be interested in changing your
benchmarks <https://github.com/KvanTTT/AntlrBenchmarks> to use a really
heavy grammar (MySQL) and do the same execution like I did for the results
above? I guess that would be a pretty good performance overview for all
available targets. Though I guess we have to leave out PHP then. It will
probably take days to finish.
—
Reply to this email directly, view it on GitHub
<#4362 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMA5NSRANU3ZBQDUETLXXHTQBANCNFSM6AAAAAA2PY4ERA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Do you wanna say that ES6 does not use prototypes? I wonder, because that's not true.
Sorry, but I don't buy that. Regardless how much V8 or any other JS interpreter can optimize the JS code, it will always be slower than natively compiled code. I'm still baffled. |
Beta Was this translation helpful? Give feedback.
-
I created an own command line test app just for C++ and now the results look much more sane:
while the current wasm code is super slow. I guess I need to move more of the generated TS code to C++, to avoid frequent border crossing. |
Beta Was this translation helpful? Give feedback.
-
After optimizing the generated parser to avoid some of the wasm border crossing, I got a pretty good speed increase (speed doubled). Still not as close to the C++ target as I hoped for and I'm not sure I can do much more, other than moving generated code to C++ (with the consequences mentioned already). There's simply too much back and forth between JS/TS and C++. The current target seems to be free of mem leaks (at least neither ASAN nor the SAFE_HEAP option do report anything). Here are the latest numbers:
While antlr4wasm is now on par with antlr4ts (or even faster for the heavy recursive input), I'm not sure if I should follow that path for now. Instead it seems more promising to me to use the JS target instead, even though it still feels odd to me that JS is so close to C++, which is another reason why I would like to have benchmarks for all current ANTLR4 runtimes (@KvanTTT) and they should be part of the tests. This would also help to avoid regressions like the one we had for the JS runtime. Also, @parrt is understandably very reluctant to add yet another ANTLR4 runtime, so this sounds like the better option overall (but I may follow the wasm road in a different way in the future, though). But as I mentioned before, there's quite some stuff missing in the JS target and particularly the type definitions miss many things. I can create the full typings for JS, but I'd really prefer to use the same folder and file structure like the Java runtime (flattened, like in the C++ runtime), except for JS specific stuff. Maybe in a second step we can convert the JS files to TS, later? For the node package this makes no difference, webpack will transpile the sources. What do you think @ericvergnaud about this plan? Any objections to change the structure? |
Beta Was this translation helpful? Give feedback.
-
@mike-lischke Re the structure , the reason I changed it is because the Java one is not good enough for me. As an example, I don't understand why Exception classes are at the top level, aside utility classes such as ProxyErrorListener, or even worse RuleContextWithAltNum... Imho, the top-level should only directly contain classes necessary for implementing basic parsing. And the ATN folder also contains way too many classes. So how about aligning all runtimes on the JS runtime structure (not saying there isn't room for improvement in it) ? Re d.ts files, I suspect we have somewhat different philosophies: I only want to expose what people need, such that I limit backwards compatibility issues, whereas it seems you want to expose everything. Not sure we can find common grounds here... As per the js -> ts conversion, given your benchmark results, I'd suggest thinking about it 7 times at least... In theory, antlr4ts should be faster thanks to an optimized algorithm, but in practice it's actually slower. It's very possible that this slowness comes from the ts -> js conversion. |
Beta Was this translation helpful? Give feedback.
-
On the long term (ANTLR5?), I believe that a WASM-only runtime would make sense. The tool would generate:
|
Beta Was this translation helpful? Give feedback.
-
Looks like there's no further interest in this discussion, so I'm closing it... |
Beta Was this translation helpful? Give feedback.
-
Hi Mike,
Well, I was hoping to be able to produce a single WASM file (possibly based
on WASI) and then use it both in the browser and load it from all languages
offering a WASM runtime.
Perhaps one can achieve the same compiling JS/TS code?
Cheers,
Federico
…On Fri, 15 Sept 2023 at 16:16, Mike Lischke ***@***.***> wrote:
Hey Federico, it turned out that the WebAssembly variant of the runtime is
slower than the JS runtime, so it does not really pay off to follow that
road further. Why take all the burden to use wasm, if you can get the same,
but with just plain JS/TS code? The embind code is pretty large and there
are many details to consider (and not everything is really clear right now,
especially regarding memory management), so I decided to put that on hold
and improve code that is proven to work well.
—
Reply to this email directly, view it on GitHub
<#4362 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADLGJTZNV6OFY65PHG44F3X2RPKHANCNFSM6AAAAAA2PY4ERA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Software Architect & Founder at Strumenta (https://strumenta.com)
Technical blog at https://tomassetti.me
GitHub https://github.com/ftomassetti
Twitter @ftomasse
linkedin.com/in/federicotomassetti
|
Beta Was this translation helpful? Give feedback.
-
Hi all,
Today I announced the development start of a new ANTLR4 target: antlr4wasm, a WebAssembly port of the C++ runtime, in the ANTLR4 announcement list (Google group):
Eric already replied and suggested to move this to an own discussion:
So, here we go, exchanging ideas what could be done with this new project, coordinating help and so on to make this reality as fast as we can.
Beta Was this translation helpful? Give feedback.
All reactions