-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible improvements for tool builders #29
Comments
Hi @neon12345 I trying the demo but I get an error. Can you help me figuring out how to use the tool? I think getting suggestions for ANTLR 5 is very useful, especially when coming from tool builders like you. Regarding your question, I am not sure I get what you mean with "unique antlr state for as many positions in the grammar as possible". Could you give me an example? |
Your ISP needs to support IPv6. Other than that, when you press the Run button, the source code on the top left is sent to the server and the returned JavaScript AST is transformed with the script at the bottom into the top right result editor. There is also some help accessible with the bottom tab. |
For recursive rules, the parentState is set for the context. What we need from the context and currently have to patch in, is the state before the precpred.
|
@neon12345 this is how I see the demo: https://www.loom.com/share/1523249bd9f2458785db29ef2c4ea421 |
@ftomassetti |
Can't this be compute using "parentState" and the ATN containing that NFA state number? Given a parse tree, get "parentState" and "see if it is one of your recursive productions". Then, examine the ATN: follow the edges in the ATN back until you find a transition that involves "_p". Presumably, this would be a PrecedencePredicateTransition edge. |
The way how we work with antlr is a visitor that gives the information about the position in the grammar with the antlr states. From a parsing perspective, it is best to get the information from the current context without many calculations. Thus from our perspective, it would be more valuable to give another state to the context. But I don't know other use cases and why the state had to be stored this way. |
Oh... surprisingly ipv6 does not work for me, so I am afraid I cannot watch the demo |
Considering the result of a parse is a parse tree, it seems fine to "hang" the "calling state of the ATN to another ATN" in the parent parse tree node. (It really should have been an edge ID because, possibly, one could have multiple edges with the same non-terminal symbol from a state.) To avoid computing the precedence predicate state in a parse tree traversal, you could pre-compute prior to parse the precedence predicate state from the ATNs, and place them in a O(1) map. So, for the java grammar, state "1372" would be the "parentState" for an ExpressionContext, but the computed precedence predicate state be "1370". |
@kaby76 |
The example works now with ipv4. With the states available in the antlr parser as proposed, it is easy for us to:
automatically for any language. Looking at the network traffic (with chrome dev tools) when using the example, one can see the generated AST for the selected language with visit and print methods. (the 3rd blob:null....) |
I'm trying to understand what "antlr state" (original comment, paragraph 2, 1st sentence; and in your latest comment Looking at the demo, when I click "Run", the upper-right frame outputs:
The demo apparently performs a parse of some Swift code in the upper-left frame, generating a parse tree, then walks over the parse tree by executing some JavaScript code in the lower frame.
Assuming the grammar for Swift is one from grammars-v4, the engine is probably in Java because all the grammars provided there are all implemented for Java, and only Java. And, since the script is written in JavaScript, you've likely worked out a way to export/import Antlr4 parse trees from a Java server to the script running in the browser. And, you defined an API that contains methods on parse tree nodes for Right now, Antlr generates a class for each parser node type and methods to access the children of the node named after the parser rule used on the right-hand side of a rule. Alternatively, the Antlr runtime provides an XPath interpreter that adds a query language over the parse tree. Though it is not implemented for the JavaScript/TypeScript target, it is with Mike's Antlr-ng/Antlr4ng system. Your API in your demo is equivalent to the XPath API. Currently, in Antlr, to get the name of the parse tree node, you need to have access to the The Trash Toolkit implements an entirely new parse tree because Antlr parse trees are incomplete. The trees that Trash produces include the name of the parse tree node, as well as inter-token strings embedded in the parse tree as "attributes" so that one can construct XPath expressions over attributes. The alternative is visitors and listeners, which is basically programming in assembly language. The Antlr XPath API might already handle queries like One reason why I wrote Trash is because the Antlr XPath in the runtime does not implement a realistic XPath engine. It's not even version 1.0 compliant. Trash uses a real XPath version 2 engine, ported from Xalan, but I've been trying to move to an XPath 3 engine with XQuery using Saxon. Based on this experience, I would recommend that the sub-standard XPath engine be replaced with a variety of high-quality query language engines. Ideally, Antlr5 would output a full-fidelity parse tree serialization. Antlr should not be implementing graph query languages. These are very complicated. As I mentioned elsewhere, I do look over projects from time to time in GitHub. I have noticed someone trying to write an XQuery engine over Antlr parse trees. https://github.com/AleksanderKruk/antlr-xquery It's the right idea, but people really should not be implementing graph query languages. |
I am the author of a tool for converting the antlr parse tree into a denser AST, which can output a standalone JavaScript version to query/transform/print/visualize using CSS selectors. (The end goal is to use machine learning with rule induction for AST transformation/printing.) Demo
Since there is the possibility of changing things with a new version, it would be nice to have a unique antlr state for as many positions in the grammar as possible. This does not currently apply to recursive rules.
The text was updated successfully, but these errors were encountered: