Replies: 3 comments 8 replies
-
+1 for this. It seems to me that on the lexer side a lot of the customization burden could be cleanly handled with an additional |
Beta Was this translation helpful? Give feedback.
-
Everything looks fine to me. For your grammar above as L.g4, I get:
I understand your interest in customizing token names, but I don't think I will be going down this path. |
Beta Was this translation helpful? Give feedback.
-
Well, one could argue those all should be different tokens. |
Beta Was this translation helpful? Give feedback.
-
Summary
Based on the grammar files, ANTLR generates lexer and parser classes which contain contain a vocabular providing literal names for each token. The generated vocabular has several issues which should be addressed in future releases:
Incomplete literal names
Let's take a look at the following lexer definitions:
The generated lexer class only provides
'<='
for the LTE token. For the other tokens in this example the literal name isnull
. For token LT there's no apparent reason why it resorts tonull
(is this a bug?). As for token NE there should be a reasonable default literal name like'<>', '!' or '!='
. Essentially if the token does not represent a single string it defaults tonull
and even in some other cases (eg. LT from the example above) it behaves unexpectedly.Customizing the vocabulary
As the default vocabulary is incomplete, the need for customization becomes an issue.
Back in the days (Antlr3) it was possible to subclass the generated lexer/parser and override the literal names in a static part of the class. With Antlr4 everything is either private or final or both. The only way I've found to overcome this problem is to subclass the generated class and override
getVocabulary()
with a customVocabulary
implementation .If this were an exceptional requirement, it would be a valid solution. But as the generated vocabulary is useless for almost every grammar, I've had the need to do this in every project.
Possible improvements:
Vocabular generated twice for lexer/parser combinations
For lexer/parser combinations the generated vocabularies in both classes are identical. This is inefficient and, with respect to the customization part described above, there's now the need for subclassing both generated classes.
Basically I don't think it should be required to subclass the generated classes at all. The requirement to do so shows that the generated classes lack customization abilities.
Possible improvements:
XYZListener
andXYZBaseListener
) which can be customized once and used for both lexer and parserBeta Was this translation helpful? Give feedback.
All reactions