Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve on support for ... and ..i in odd places #157

Merged
merged 12 commits into from
Nov 22, 2024

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Nov 20, 2024

Closes #148

This is motivated by needing support for x$..., which is the most common case of this weirdness. But I've managed to tighten up quite a few of these extremely obscure cases while making things consistent. All of these are valid R code:

  • x$... and x@...
  • x$..1 and [email protected]
  • foo::... and ...::foo (and ::: versions)
  • foo::..i and ..i:::foo (and ::: versions)
  • for (... in 1:2) print(get("..."))
  • for (..1 in 1:2) print(get("..1"))
  • function(..1) get("..1")
  • function(..1 = 2) get("..1")
  • fn(..1 = 2)
  • fn(... = 2)

I went with the approach we talked about of keeping $.identifier, $.dots, and $.dot_dot_i as separate nodes (because they are useful to downstream consumers), and making a union type of $._identifier_or_dots_or_dot_dot_i that gets used internally.

The commits are technically backwards. I wrote the impl but didn't commit it until I committed all the updated tests first, just to make sure it was correct. Probably easiest to read each commit individually.

Comment on lines -240 to +247
field("name", $.identifier),
$._parameter_name,
"=",
optional(field("default", $._expression))
),

_parameter_without_default: $ => field("name", choice($.identifier, $.dots)),
_parameter_without_default: $ => $._parameter_name,

_parameter_name: $ => field("name", $._identifier_or_dots_or_dot_dot_i),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can now collapse the parameter "name" field for with/without default cases into a singular _parameter_name, since the definitions have converged

Comment on lines +573 to +575
// Identifier-ish, but useful enough to be their own nodes
dots: $ => "...",
dot_dot_i: $ => token(withPrec(PREC.DOT_DOT_I, /[.][.]\d+/)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved these to be very close to identifier, since they are indentifier-ish and always get lumped in with identifier now

Comment on lines +577 to +594
// NOTE: Technically R allows `...` and `..1` anywhere we want an `$.identifier`,
// but practically it can be useful for downstream consumers to have separate
// nodes for these particular constructs. Our compromise is to keep these as separate
// nodes, but then use this in most places we want an identifier.
_identifier_or_dots_or_dot_dot_i: $ => choice(
$.identifier,
$.dots,
$.dot_dot_i
),

// NOTE: Having this as an actual node (rather than inlining the `choice()`) somehow
// ends up allowing better error recovery in a few cases
_string_or_identifier_or_dots_or_dot_dot_i: $ => choice(
$.string,
$.identifier,
$.dots,
$.dot_dot_i
),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played with making these names shorter but ultimately I think it made things more obscure.

It's easier to read the call sites with these long explicit names that tell you exactly what can go there.

Comment on lines +596 to +615
// NOTE: This is exactly `_string_or_identifier_or_dots_or_dot_dot_i` but with
// a precedence of 1. It seems like we have to set the `prec(1, )` on the `choice()`
// directly, we can't reuse `_string_or_identifier_or_dots_or_dot_dot_i` here
// otherwise `tree-sitter generate` throws an unresolved conflict error.
//
// This is only for use in `_argument_named`.
//
// Since `_argument_unnamed` can be an arbitrary `_expression` (with precedence 0)
// which includes `string`, `identifier`, `dots`, and `dot_dot_i`, there is an
// ambiguity between:
// - Starting the `value` of an `_argument_unnamed`
// - Starting the `name` of an `_argument_named`
//
// We set a higher precedence here to try and match `_argument_named` first.
_argument_name_string_or_identifier_or_dots_or_dot_dot_i: $ => prec(1, choice(
$.string,
$.identifier,
$.dots,
$.dot_dot_i
)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly tricky case where the "name" field of _argument_named is technically _string_or_identifier_or_dots_or_dot_dot_i, but we need to bump the precedence there to disambiguate it from _argument_unnamed.

This replaces the prec.right(1, ) that has been removed from _argument_named

Copy link
Collaborator

@kevinushey kevinushey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@DavisVaughan DavisVaughan merged commit 7c316f8 into r-lib:main Nov 22, 2024
11 checks passed
@DavisVaughan DavisVaughan deleted the feature/more-dots branch November 22, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow ... and maybe ..1 after $ (and @?)
2 participants