-
-
Notifications
You must be signed in to change notification settings - Fork 70
Description
I am developing a Tree-sitter grammar for VB.NET and have run into a persistent parsing issue with member declarations that have multiple modifiers. The parser fails to be "greedy" and consumes only the first modifier, misinterpreting the second modifier as a variable name.
This appears to be a classic shift/reduce conflict, but the standard solutions I've tried (using prec, prec.dynamic, and the conflicts array) have not resolved the issue, often because they interfere with other precedence rules in the grammar.
The Problem
Given a simple VB.NET class, the parser should correctly handle fields with both single and multiple modifiers.
Minimal VB.NET Example:
Public Class MyTestClass
' This line with a single modifier parses correctly.
Private _someField As String
' This line with multiple modifiers fails.
Private ReadOnly _anotherField As Integer
End ClassWhen parsing the line Private ReadOnly _anotherField As Integer, the parser incorrectly stops after Private and tries to parse ReadOnly as the field's name.
Incorrect AST Output:
The resulting Abstract Syntax Tree for the failing line looks like this, clearly showing the error
(field_declaration
(modifiers
(member_modifier) -- "Private"
)
(variable_declarator
(identifier) -- "ReadOnly"
)
(ERROR) -- "_anotherField As Integer"
)
The modifiers rule is not greedy, and an ERROR node is produced.
Relevant Grammar Snippet (grammar.js)
Here are the key rules from my grammar.js that are involved in this issue.
module.exports = grammar({
name: 'vbnet',
// ... other rules and extras
rules: {
// ...
member_modifier: $ => choice(
ci('Public'), ci('Private'), ci('Protected'), ci('Friend'),
ci('Protected Friend'), ci('Private Protected'), ci('ReadOnly'),
ci('WriteOnly'), ci('Shared'), ci('Shadows'), ci('MustInherit'),
ci('NotInheritable'), ci('Overrides'), ci('MustOverride'),
ci('NotOverridable'), ci('Overridable'), ci('Overloads'),
ci('WithEvents'), ci('Widening'), ci('Narrowing'),
ci('Partial'), ci('Async'), ci('Iterator')
),
modifiers: $ => repeat1($.member_modifier),
_type_member_declaration: $ => choice(
// ... other members like empty_statement, inherits_statement
prec(2, $.constructor_declaration),
prec(1, $.method_declaration),
prec(1, $.property_declaration),
// ... other members with precedence
$.field_declaration // Lower precedence
),
field_declaration: $ => seq(
optional(field('attributes', $.attribute_list)),
field('modifiers', $.modifiers),
commaSep1($.variable_declarator),
$._terminator
),
variable_declarator: $ => seq(
field('name', $.identifier),
optional($.array_rank_specifier),
optional($.as_clause),
optional(seq('=', field('initializer', $._expression)))
),
// ... other rules
}
});
function ci(keyword) {
return new RegExp(keyword.split('').map(letter => `[${letter.toLowerCase()}${letter.toUpperCase()}]`).join(''));
}
// ... other helpersThe Question
How can I modify this Tree-sitter grammar to correctly and "greedily" parse multiple consecutive modifiers in a field_declaration, while still correctly resolving the ambiguities between different types of member declarations (e.g., a field_declaration vs. a method_declaration)?
I can provide more details of the grammar if needed.