Skip to content

Conversation

@Bottersnike
Copy link

@Bottersnike Bottersnike commented Jun 29, 2025

Rendered

Defines the logic for how to parse foo[[[a]]], opting to parse this as foo["a"] rather than the current foo"[a"].

This logic can be implemented in the official Luau lexer via the following modification to https://github.com/luau-lang/luau/blob/master/Ast/src/Lexer.cpp:

index c2743640..d311758c 100644
--- a/Ast/src/Lexer.cpp
+++ b/Ast/src/Lexer.cpp
@@ -744,6 +744,11 @@ Lexeme Lexer::readNext()

     case '[':
     {
+        if (peekch(1) == '[' && (peekch(2) == '[' || peekch(2) == '=')) {
+            consume();
+            return Lexeme(Location(start, 1), '[');
+        }
+
         int sep = skipLongSeparator();

While not mentioned in the document, as it is focused on Luau, it is interesting to note that most Lua minifiers stumble when trying to minify statements of the form discussed, removing this semantic whitespace and producing code that then parses differently. Lua currently still uses the semantic-whitespace behaviour that Luau is also currently using. (I tested mathiasbynens/luamin, stravant/lua-minify and stravant/LuaMinify)

@Bottersnike
Copy link
Author

As a side note, this is totally ambiguous in Lua[u], even if deciding between long brackets vs indexing was performed during the parse stage using an advanced parser. The expression foo[[[[a]]] could be parsed as 'foo' '[' '[[' [a ']]' ']'->foo["[a"] or also as 'foo' '[[' '[[a]' ']]'->foo("[[a]"). The latter case is the existing behaviour as a side-effect of greedy parsing.

lua-l has some discussion of this issue at http://lua-users.org/lists/lua-l/2015-10/msg00171.html, though there doesn't seem to be any real conclusion drawn in that thread beyond some joking around.

There's a third alternative I forgot, that I'm adding into the RFC now.

@jackdotink
Copy link
Contributor

I don't think this passes the -200 points test.

@deviaze
Copy link

deviaze commented Jun 29, 2025

I support this RFC because, as of right now

local meowkey = {
    [[[meow]]] = 2, -- syntax error
}

local meowkey2 = {
    [ [[meow]] = 2, -- completely fine
}

Luau shouldn't be relying on load-bearing whitespace here, so fixing this would make Luau a more consistent language and thereby (in my view) meets the -whatever point requirement.

Nit, but I thought [[text \n \n]] strings were called raw strings (even though they're often used as multiline strings)

@Bottersnike
Copy link
Author

Bottersnike commented Jun 30, 2025

I don't think this passes the -200 points test.

Don't get me wrong, it's hardly an especially interesting or exciting RFC, but it's currently this awkward unspecified ambiguity that really should at least be specified. I'd be fully in support of literally just specifying that the current behaviour is the intended way to parse a file—it's just about cleaning up the fact that currently this is an awkward thing that's not really considered formally anywhere.

Edit: thanks mobile

@Bottersnike Bottersnike reopened this Jun 30, 2025
@jackdotink
Copy link
Contributor

jackdotink commented Jun 30, 2025

I think this RFC is very similar to the trailing commas in function calls RFC. That RFC was rejected because while it made users think they were happier, it didn't actually make the language better or actually make users happier. This RFC is the same.

@alexmccord
Copy link
Contributor

There's a little bit of an ambiguity here, isn't there? It makes sense to want { [[[a]]] = 5 } to be a table with some key "[a]" = 5, but what about when you use it outside of table keys? local a = [[[a]]]? That argues for lexing [[[a]]] as "[a]", which is mutually incompatible with the table parsing as proposed.


The first party Luau parser follows the semantic-whitespace behaviour.

The predominant Rust parser for Luau, [full-moon](https://github.com/Kampfkarren/full-moon), follows the semantic-whitespace behaviour.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't really semantic, since it only affects the parser step and it's a parse error or not which makes this entirely syntactic. I'd use "significant whitespace" instead (although that term is traditionally used to decide which block to place a statement in, I think it still applies)

@Bottersnike
Copy link
Author

That's a really good point. I think the "logical" parse of a = [[[a]]] would be a = "[a]" meanwhile {[[[a]]] = ...} as {["a"] = ...}.

Is there a good way to solve this? Quite likely not. The lexer could do some pretty nasty checks to decide when to perform which behaviour (ie if the previous lexeme was a name, ], ) or }, let [ take precedence over strings) though this still has utter ambiguity with foo [[[a]]] which could either be a call or an index.

The simplest solution here might be to consider this just a general poor design decision of lua, unfixable, and then codify the current behaviour in writing rather than seeking to amend it at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants