Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong parse result for hexadecimal floating points #159

Open
EagleoutIce opened this issue Nov 21, 2024 · 3 comments
Open

Wrong parse result for hexadecimal floating points #159

EagleoutIce opened this issue Nov 21, 2024 · 3 comments

Comments

@EagleoutIce
Copy link

We are currently in the process of migrating our parsing backend from the built-in R parser to this tree-sitter implementation and noted a deviation when running our tests:

Hexadecimal floating point notation like 0x0p0 (mentioned in 10.3.1 Constants in the language definition) is parsed as (program (float) (identifier)) and not simply as (program (float)) which is what I would have assumed.

This is probably related to the following part in the grammar:

tree-sitter-r/grammar.js

Lines 487 to 494 in 4a89de3

// Numeric literals.
integer: $ => seq($._float_literal, "L"),
complex: $ => seq($._float_literal, "i"),
float: $ => $._float_literal,
_hex_literal: $ => /0[xX][0-9a-fA-F]+/,
_number_literal: $ => /(?:(?:\d+(?:\.\d*)?)|(?:\.\d+))(?:[eE][+-]?\d*)?/,
_float_literal: $ => choice($._hex_literal, $._number_literal),

I have yet to get comfortable with the tree-sitter grammar syntax but maybe the following fix suffices?

-_hex_literal: $ => /0[xX][0-9a-fA-F]+/
+_hex_literal: $ => /0[xX][0-9a-fA-F]+(p[+-]?\d+)/
@DavisVaughan
Copy link
Member

?NumericConstants seems to have the best documentation on this

Hexadecimal constants start with 0x or 0X followed by a non-empty sequence from 0-9 a-f A-F . which is interpreted as a hexadecimal number, optionally followed by a binary exponent. A binary exponent consists of a P or p followed by an optional plus or minus sign followed by a non-empty sequence of (decimal) digits, and indicates multiplication by a power of two. Thus 0x123p456 is 291×2^456.

So i think yours is close but missing:

  • capital P
  • id probably do [0-9] rather than \d for consistency with the rest of the line
  • we need to make sure that all of ([pP][+-]?[0-9]+) is marked as "optional" in the regex. can't remember if we get that or not from this exact form
_hex_literal: $ => /0[xX][0-9a-fA-F]+([pP][+-]?[0-9]+)/

@DavisVaughan
Copy link
Member

Also neat to hear that you are using the parser over in things like flowr-analysis/flowr#1160! Let us know if you find more deviations, we are trying to make it as high fidelity as we can, as this is used in things like Ark which help power Positron's R support https://github.com/posit-dev/ark

@EagleoutIce
Copy link
Author

you are correct i neither considered the capital P nor the fact that it is optional (even though i considered both of this when writing our normalization for R's base::parse) - I am sorry!

We found the deviations simply by checking the normalized tree-sitter results against our normalization for the R parser so we make sure to report anything else we notice. Thank you for the great project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants