Skip to content

Conversation

@andyfriesen
Copy link
Collaborator


First, and arguably most importantly: The inference we already have is performing quite well! I could only find one ticket that points out a bug in the system: https://github.com/luau-lang/luau/issues/1483. This bug is a little bit esoteric and can probably be fixed in a reasonable timeframe, so it is not itself strong evidence that we should change the language.

Secondly, most code formatters in modern use will automatically change all string literals to use double quotes because that is considered good style. If this RFC is implemented, those tools will need to be updated so they do not break type inference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tooling side of this makes this whole thing a dealbreaker to me. This is current default behavior of StyLua:
ezgif-8317ad5f9596de

Notably:

  • It enforces double quoted strings
  • It converts strings that only escape double quotes into single-quoted strings

This feature being added to Luau would kill both of those formatter options. While that's not its own justification, I think the loss of them would be enough of a downside that I can't support this.

I want my formatter to be able to enforce string-style, and to be able to make things easier to read without changing the semantic meaning of my program, including the type system. I don't want to have to review pull requests to double check that they're using the right quote style for the right situation. That's work that is pointless and completely avoidable.

@jackdotink
Copy link
Contributor

Note that I don't use a formatter for Luau, so I've used this without problems others have had.

This has been wonderful to use. It hasn't caused any problems for me when writing or reading code. The worst that happens is that I type a singleton with " out of habit, then change it to the singleton syntax.

I for one do not think the loss in formatting options is enough to eliminate this feature, nor do I think it will be possible to infer singleton vs string in every position, so some kind of singleton syntax is needed. This syntax is good, and there are other good options too.

@Ukendio
Copy link

Ukendio commented Oct 2, 2025

I am not opposed of this personally, as I think it would make sense for single quotes to have different semantics over types than double quotes. And the syntax is pretty great, but the bar is low considering that it is standing next to casting to literals

Although the alternative of fixing the bugs and allowing bounded generics to be able to have a strict subtype constraint makes more sense to me. It would likely involve less work for my own codebases.

@deviaze
Copy link

deviaze commented Oct 2, 2025

I like the distinction between stringletons and regular strings afforded by ' vs ". It makes it more convenient (and readable) to type enum-like tables without having to explicitly cast every value, and adds purpose to having different quote styles (other than facilitating escaping the opposite).

I think that quote style should affect (but not necessarily completely override) type inference -- you should still be able to do "somestring" :: "somestring" and 'string with "quotes"' :: string to override their behaviors respectively if you so choose.

Here's an example from seal's datetime library illustrating why I'd want this feature:

local datetime = {
    common_formats = {
        ISO_8601 = "%Y-%m-%d %H:%M" :: "%Y-%m-%d %H:%M",
        RFC_2822 = "%a, %d %b %Y %H:%M:%S %z" :: "%a, %d %b %Y %H:%M:%S %z",
        RFC_3339 = "%Y-%m-%dT%H:%M:%S%:z" :: "%Y-%m-%dT%H:%M:%S%:z",
        SHORT_DATE = "%Y-%m-%d" :: "%Y-%m-%d",
        SHORT_TIME = "%H:%M" :: "%H:%M",
        FULL_DATE_TIME = "%A, %B %d, %Y %H:%M:%S" :: "%A, %B %d, %Y %H:%M:%S",
        LOGGING_24_HR = "%a %b %e %H:%M:%S %Z %Y" :: "%a %b %e %H:%M:%S %Z %Y",
        LOGGING_12_HR = "%a %b %e %I:%M:%S %p %Z %Y" :: "%a %b %e %I:%M:%S %p %Z %Y",
        ["MM/DD/YYYY"] = "%m/%d/%Y" :: "%m/%d/%Y",
        ["MM/DD/YYYY HH:MM (AM/PM)"] = "%m/%d/%Y %I:%M %p" :: "%m/%d/%Y %I:%M %p",
        ["MM/DD/YY"] = "%m/%d/%y" :: "%m/%d/%y",
        ["HH:MM (AM/PM)"] = "%I:%M %p" :: "%I:%M %p",
        AMERICAN_FULL_DATE_TIME = "%A, %B %d, %Y %I:%M:%S %p" :: "%A, %B %d, %Y %I:%M:%S %p",
    }
}

I don't think this is an uncommon paradigm; it's currently pretty messy and would be surely made better by dedicated stringleton inference syntax?

Like Jack, I don't rely on a formatter for Luau, but I also think that Luau code formatters should be automatically handling Luau semantics such as ' vs ", if such a distinction becomes codified. A formatting tool that respects this convention would undoubtedly reduce this burden for maintainers to fix their codebases manually, and style conventions should adapt to support it.

@deviaze
Copy link

deviaze commented Oct 3, 2025

As an alternative (as discussed on Discord), we could provide a symbol string literal syntax ala Ruby, allowing users to explicitly choose literal inference without casting nor breaking existing tooling:

type Animal = :Cat | :Dog | :Seal | :"Snow Leopard"

local animal: Animal = :Cat

local common_formats = {
    ISO_8601 = :"%Y-%m-%d %H:%M"
}

local entry: (:File | :Dir)? = nil
entry = "File" :: "File" -- continues to work

-- this unfortunately becomes legal 
local meow::meow=:meow

Symbol string literals would act like regular single/double-quoted strings at runtime.

Like interpolated string literals, symbol string literals cannot be used in non-parenthesized functioncalls (to avoid ambiguity).

Single/double quoted strings casted to themselves will continue to act like stringletons for backwards compatibility (we can add a lint).

We can flesh this out into a full RFC if there's interest?

@alexmccord
Copy link
Contributor

alexmccord commented Oct 3, 2025

local meow::meow=:meow -- this unfortunately becomes legal

It doesn't have to be. Try doing type Id<T>= T.

@MagmaBurnsV
Copy link
Contributor

If you're going to start inferring some string literals as singletons, why not all of them? Almost every style guide like Roblox's prefers double quotes and it just seems oddly alienating not to support it.

My guess is there's some concern with backwards compatibility with the likes of table.create(50, "hello") inferring as {"hello"} instead of {string}. But I'd argue that's preferrable because it's more "correct" even if a little unergonomic.

Regardless, having two different inference rules for what are normally interchangeable style preferences seems a little surprising. Whatever the inference rule is, it should stay consistent for all styles in my opinion.


Given that this is the most common use case, it makes quite a lot of sense to offer separate syntax to allow developers to be precise about what they want.

In this RFC, we propose the syntax `"foo"` for a string of type `string`, and `'bar'` for a string with the singleton type `"bar"`.
Copy link
Contributor

@alexmccord alexmccord Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier, the RFC says "This strategy is broadly successful, but fails in a few ways" and also has another supporting argument, but the RFC as proposed is removing this nice ergonomic feature without a reason?

There's a half-way solution here.

Suggested change
In this RFC, we propose the syntax `"foo"` for a string of type `string`, and `'bar'` for a string with the singleton type `"bar"`.
In this RFC, we propose the syntax `'foo'` for a string with the singleton type `'foo'`, and `"bar"` to retain its current behavior standing for either the singleton type `'bar'` or `string`, depending on the bounds.

Comment on lines +43 to +50
return table.freeze {
semicolon = new_token(';' :: any) :: cst.TokenKind<';'>,
equals = new_token('=' :: any) :: cst.TokenKind<'='>,
colon = new_token(':' :: any) :: cst.TokenKind<':'>,
comma = new_token(',' :: any) :: cst.TokenKind<','>,
dot = new_token('.' :: any) :: cst.TokenKind<'.'>,
endd = new_token('end' :: any) :: cst.TokenKind<'end'>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, this could also be solved with explicit instantiation too. new_token<<";">>(";"). But I must say, I don't understand why new_token(';' :: any) :: cst.TokenKind<';'> is necessary here when new_token(';' :: ';') would work just as well?


```lua
local function new_token<Kind>(kind: Kind, text: string?): cst.TokenKind<Kind>
return { kind = kind, text = text or kind :: any, span = span, trivia = trivias }
Copy link
Contributor

@alexmccord alexmccord Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific cast is arguably because of the lack of generic bounds. If TokenKind defines text: string, then Kind must be bounded by string for text or kind to be valid, so this specific example is unsound because you can write new_token(5). If we could utter Kind: string, then string | Kind always reduces to string, and so text = text or kind will type check.

Copy link
Contributor

@alexmccord alexmccord Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you write type TokenKind<Kind> = { read kind: Kind, read text: string, .. }, then new_token(";") simply infers TokenKind<";"> because of polarity. This should already work today, and if it doesn't then I would consider that a different bug orthogonal to this RFC.

@alexmccord
Copy link
Contributor

alexmccord commented Oct 5, 2025

If you're going to start inferring some string literals as singletons, why not all of them? Almost every style guide like Roblox's prefers double quotes and it just seems oddly alienating not to support it.

Because that's pathologically horrendous. If you have a type {"a"}, you cannot pass it where {string} is expected. See this wikipedia page on variance.

The problem here is this: if you write local t = {"a"}, then what type does t have: {string} or {"a"}? Trick question, it's neither. The type of t is {"a" <: 'a <: string} which is equivalent to {"a"} <: 'a <: {"a"} | {"a" | "b"} | {"a" | "b" | "c"} | {"a" | "b" | "c" | "d"} | ... | {"a" | "ab"} | {"a" | "ac"} | ... (ignoring read/write indexers). If t is used in such a way that accepts {"a" | "b"}, then we have a constraint 'a <: "a" | "b" and so t has the type {"a" | "b"}. But due to invariance, {"a"} is not a subtype of {string}, and because of that, the suggestion of "every literal infers the singleton" is not the same thing that the current type solver does with string literals when it interacts with tables.

Even worse, the approach will not work across functions.

function make()
  return {"a"}
end

local t = make()

With your approach here, you can't even say local t: {string} = make() because it's make itself that has the type () -> {"a"}, and {"a"} is not a subtype of {string}. This entire class of problems exists due to invariance because you can read (covariant) and write (contravariant) to the table. That and keyword means that passing {"a"} where {string} is expected is a type error because both type must match exactly, and evidently it doesn't.

That's why the RFC is proposing one knob, have single quote strings infer the singleton type (to replace the stupid "foo" :: "foo" code smell) and have double quotes be of type string (although my stance is for double to keep current behavior as-is).

My guess is there's some concern with backwards compatibility with the likes of table.create(50, "hello") inferring as {"hello"} instead of {string}. But I'd argue that's preferrable because it's more "correct" even if a little unergonomic.

This is funny to me because in the old type system, I put in a lot of work to fix various unification bugs to force singletons to generalize into their top primitive type, like this:

function f(t, x)
  if x ~= "x" then
    return
  end

  table.insert(t, x)
end

In the old solver, unification of two types were more or less an equality constraint (the story is more complicated than that), so if the subtype or the supertype were a free metavariable, it would bind one to the other and be done. That meant the above snippet would infer the type f : ({"x"}, unknown) -> (), which is obviously utter nonsense and overly restrictive. The correct type is f : ({string}, unknown) -> ().

That means adding a rule in unification where if the subtype is a singleton (distributive over union) and the supertype is a free metavariable, then we generate a new "replaced subtype" where the singletons are replaced by their top primitive types (true/false -> boolean, and any string singletons -> string) and then bind the free metavariable to the replaced subtype. This solved the problem, and those bug reports disappeared overnight. It's a shame that the bug reports got replaced by people having a hard time understanding how literal generalization worked, which is unfortunate because it was a necessary evil in that world, and trying to do anything else would break the consistency in the typing algebra. (even I hated the literal generalization rule because of the DX issues it brought to the table)

If we did make every string literals infer the string singleton type in the new solver, we're going back to that world for the second time. The current strategy we have in the new solver is subtly different, which is why doesn't seem to have any glaring DX issues and doesn't clash with invariances. I was very happy when I had the brainwave to suggest that singleton inference should be "a" <: 'a <: string for all strings (ditto booleans) in the new solver.

@MagmaBurnsV
Copy link
Contributor

If that's the case, I'd rather have literals inferred consistently as a boring string instead of the language making an opinionated choice on which style gets special treatment.

These variance issues won't go away if you push them over to someone's personal preference of which quotation marks to use. Someone's bound to do return {'Bob says "Hi!"'} and have no clue why defining the function return type as {string} doesn't work.

There's got to be a better way to handle the core issue here, which is avoiding the f("a" :: "a") syntax that you say looks stupid. For example, higher order generic constraints (if/when they come) could force literal arguments to infer as singletons, e.g. the signature f<t: singleton>(t) -> t could infer t in f("a") as "a".

@Bottersnike
Copy link

Personally, I think this is something that really needs its own syntax, rather than retconning existing syntax.

For those of us in the know, this RFC would be a very useful and powerful change, but how many of the existing end users of the language are going to be aware that ' suddenly has special semantic meaning. Unless there's a very clear path for how this would be communicated to users with relative urgency, I fear it would cause immense confusion in the existing user-base for the language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

9 participants