-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguous comment token definition in ODIN grammar #31
Comments
It seems to me that a simpler approach is to move the H_CMT_LINE to one of the ADL Antlr files, because that special long line is only special in between whole ADL texts. So the ODIN Antlr file would go back to just matching with the CMT_LINE rule. |
I don't think that will work - the lexer rules are the same for all the files and not specific for one of them, since you use Odin within adl. So you can move them but it will not be handled like two different levers. In fact, having all the lexer rules in one antlr file makes sense because of this. Unless you have a way in the lexer to know when Odin parts start and end, then you can define different lexer modes. But perhaps there is some way in the parser to define when the special comment lines are comments and when they are used to separate overlays. |
Yep, you are right - that won't work in Antlr. We need a solution based on modes. Or else we do what Seref said, and define a different kind of comment line. Maybe an easy one would be:
|
I guess upper layers of software that go beyond bmm processing (which is what I'm doing) would need to use modes anyway but it is how we know we're supposed to use the mode_x that is the problem, when the token level ambiguity exits. So I'd stick to my different type of comment suggestion. Your suggestion for syntax is just fine Tom. |
for lexical modes you need to be able to switch modes in the lexer. I don't think you can, because there's no unambiguous token for when Odin starts and ends - not in the lexer. It's not like we have a CDATA tag like in XML or a /* to start a comment. It's why I could not use lexical modes when I tried to unambiguously fix both regular expressions and path parsing, and had to resort to handling them as a single token in Archie. With some custom target language code it's probably possible - but I think that has some serious drawbacks since you would need a version per target language. Handling this in the parser would be hard because these comments can appear literally everywhere. I don't see an easy solution in the current syntax. |
I could not find an example of the template overlay use case for the special comment token so I'm a bit in the dark re its use in that context but it my ignorant guess is switching to a CDATA-ish or /* ... */ comment format would help, based on my understanding of your comments @pieterbos This is a step further than what @wolandscat is suggesting but it sounds like a more future proof approach. |
Example of a single file template. The original attraction in using long lines was that they visually separate the overlays nicely and that they are already ADL comments anyway. |
Oh, some other ways to solve this:
I don't think these are very elegant ways though, but both should work. |
@pieterbos just to clarify: Are you suggesting that the parser uses only template_overlays keyword to identify t. overlays and not use the long comment line in your first suggestion? |
The base patterns file here: https://github.com/openEHR/adl-antlr/blob/master/src/main/antlr/base_patterns.g4 which is used in odin grammar defines comment tokens as follows:
H_CMT_LINE : '--------' '-'? '\n' ; // special type of comment for splitting template overlays
CMT_LINE : '--' .? '\n' -> skip ; // (increment line count)
This introduces a problem with antlr 4.7, but interestingly, 4.5 manages to parse input, which I think should not be possible: probably Antlr's smart parsing algorithms are relaxing the rules a bit too much.
Since Antlr considers the declaration of tokens in the grammar files during parsing, the H_CMT_LINE token overrides the CMT_LINE token in the following examples, taken from openEHR RM bmm (CIMI bmms have same types of comments too):
The long comments above are meant to be comments and they don't imply the special template overlay comment. However, based on the grammar file above, the lexer is bound to match these long comment lines to H_CMT_LINE which I'd say is the correct behaviour. The result, using antlr 4.7 (the latest version) is that the parse tree never goes beyond the long comment lines.
Changing the order of CMT_LINE and H_CMT_LINE fixes the issue but I'm sure it'd break the behaviour within archetype files where template overlays are expressed.
I'd suggest changing the template overlay comment token to something else. Maybe something like:
The text was updated successfully, but these errors were encountered: