diff --git a/docs/design/classes.md b/docs/design/classes.md index da0a61e1f49a0..cfeda2a0a45cc 100644 --- a/docs/design/classes.md +++ b/docs/design/classes.md @@ -600,9 +600,15 @@ Assert(different_order.x == 3); Assert(different_order.y == 2); ``` -Initialization and assignment occur field-by-field. The order of fields is -determined from the target on the left side of the `=`. This rule matches what -we expect for classes with encapsulation more generally. +Initialization and assignment occur field-by-field. The overall order is that +the initializer is fully evaluated in the order it's written, and then each +field of the new object is initialized from the corresponding element of the +result, in the new object's field order. However, in some cases evaluation of +the initializer can directly initialize fields on the left-hand side, without +any intervening conversions. When that happens, the order of initialization of +those fields is determined by the evaluation order of the initializer, and +happens before initializing the fields of the new object that are not +initialized directly. See [here](values.md#type-conversions) for details. **Open question:** What operations and in what order happen for assignment and initialization? @@ -610,9 +616,6 @@ initialization? - Is assignment just destruction followed by initialization? Is that destruction completed for the whole object before initializing, or is it interleaved field-by-field? -- When initializing to a literal value, is a temporary containing the literal - value constructed first or are the fields initialized directly? The latter - approach supports types that can't be moved or copied, such as mutex. - Perhaps some operations are _not_ ordered with respect to each other? ### Operations performed field-wise diff --git a/docs/design/expressions/implicit_conversions.md b/docs/design/expressions/implicit_conversions.md index ec8e1fdeed0f6..a6c3805d18361 100644 --- a/docs/design/expressions/implicit_conversions.md +++ b/docs/design/expressions/implicit_conversions.md @@ -20,6 +20,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Same type](#same-type) - [Pointer conversions](#pointer-conversions) - [Facet types](#facet-types) + - [Struct, tuple, and array types](#struct-tuple-and-array-types) - [Consistency with `as`](#consistency-with-as) - [Extensibility](#extensibility) - [Alternatives considered](#alternatives-considered) @@ -189,6 +190,10 @@ implicitly converted to the facet type `TT2` if `T` [satisfies the requirements](../generics/details.md#subtyping-between-facet-types) of `TT2`. +### Struct, tuple, and array types + +See [here](/docs/design/values.md#type-conversions). + ## Consistency with `as` An implicit conversion of an expression `E` of type `T` to type `U`, when diff --git a/docs/design/expressions/member_access.md b/docs/design/expressions/member_access.md index f496376b71716..a213524d4a754 100644 --- a/docs/design/expressions/member_access.md +++ b/docs/design/expressions/member_access.md @@ -13,7 +13,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Overview](#overview) - [Member resolution](#member-resolution) - [Package and namespace members](#package-and-namespace-members) - - [Types and facets](#types-and-facets) + - [Types, forms, and facets](#types-forms-and-facets) - [Tuple indexing](#tuple-indexing) - [Values](#values) - [Facet binding](#facet-binding) @@ -121,13 +121,18 @@ A member access expression is processed using the following steps: The process of _member resolution_ determines which member `M` a member access expression is referring to. -For a simple member access, if the first operand is a type, facet, package, or -namespace, a search for the member name is performed in the first operand. -Otherwise, a search for the member name is performed in the type of the first -operand. In either case, the search must succeed. In the latter case, if the -result is an instance member, then [instance binding](#instance-binding) is +For a simple member access, if the first operand is a type, form, facet, +package, or namespace, a search for the member name is performed in the first +operand. Otherwise, a search for the member name is performed in the type of the +first operand. In either case, the search must succeed. In the latter case, if +the result is an instance member, then [instance binding](#instance-binding) is performed on the first operand. +A search for a name within a form searches for the name in its +[type component](/docs/design/values.md#expression-forms). Note that this means +that the form of an expression never affects simple member access into that +expression, except through its type component. + For a compound member access, the second operand is evaluated as a compile-time constant to determine the member being accessed. The evaluation is required to succeed and to result in a member of a type, interface, or non-type facet, or a @@ -183,11 +188,12 @@ class Bar { } ``` -### Types and facets +### Types, forms, and facets -If the first operand is a type or facet, it must be a compile-time constant. -This disallows member access into a type except during compile-time, see leads -issue [#1293](https://github.com/carbon-language/carbon-lang/issues/1293). +If the first operand is a type, form, or facet, it must be a compile-time +constant. This disallows member access into a type except during compile-time, +see leads issue +[#1293](https://github.com/carbon-language/carbon-lang/issues/1293). Like the previous case, types (including [facet types](/docs/design/generics/terminology.md#facet-type)) have member @@ -222,6 +228,9 @@ class Avatar { Simple member access `(Avatar as Cowboy).Draw` finds the `Cowboy.Draw` implementation for `Avatar`, ignoring `Renderable.Draw`. +Similarly, a form has members, specifically the members of the form's type +component. + ### Tuple indexing Tuple types have member names that are *integer-literal*s, not *word*s. @@ -267,9 +276,9 @@ let n: i32 = p->(e); ### Values -If the first operand is not a type, package, namespace, or facet, it does not -have member names, and a search is performed into the type of the first operand -instead. +If the first operand is not a type, form, package, namespace, or facet, it does +not have member names, and a search is performed into the type of the first +operand instead. ```carbon interface Printable { @@ -717,16 +726,22 @@ fn SumIntegers(v: Vector(Integer)) -> Integer { ## Instance binding Next, _instance binding_ may be performed. This associates an expression with a -particular object instance. For example, this is the value bound to `self` when -calling a method. +particular object or value instance. For example, this is the value bound to +`self` when calling a method. For the simple member access syntax `x.y`, if `x` is an entity that has member names, such as a namespace or a type, then `y` is looked up within `x`, and instance binding is not performed. Otherwise, `y` is looked up within the type of `x` and instance binding is performed if an instance member is found. -If instance binding is performed: +If instance binding is to be performed, the result of instance binding depends +on what instance member `M` was found: +- For a field member of a struct type or tuple type, `x` is converted to a + struct or tuple form by + [form decomposition](/docs/design/values.md#category-conversions), and the + `.f` element of the outcome of that conversion becomes the outcome of `x.f`. + All other elements are [discarded](/docs/design/values.md#form-conversions). - For a field member in class `C`, `x` is required to be of type `C` or of a type derived from `C`. The result is the corresponding subobject within `x`. If `x` is an diff --git a/docs/design/functions.md b/docs/design/functions.md index 1e826b9eca5b6..5160dc0aaa70b 100644 --- a/docs/design/functions.md +++ b/docs/design/functions.md @@ -90,6 +90,9 @@ possible syntaxes: `fn Sleep(seconds: i64) -> ();`. - `()` is similar to a `void` return type in C++. +> **TODO:** Update this section to cover return forms, as discussed +> [here](values.md#function-calls-and-returns). + ### `return` statements The [`return` statement](control_flow/return.md) is essential to function @@ -104,6 +107,8 @@ When the return clause is provided, including when it is `-> ()`, the `return` statement must have an expression that is convertible to the return type, and a `return` statement must be used to end control flow of the function. +> **TODO:** Update this section to cover the form + ## Function declarations Functions may be declared separate from the definition by providing only a diff --git a/docs/design/pattern_matching.md b/docs/design/pattern_matching.md index 840b48148ab1e..5581830d450ee 100644 --- a/docs/design/pattern_matching.md +++ b/docs/design/pattern_matching.md @@ -18,7 +18,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Name binding patterns](#name-binding-patterns) - [Unused bindings](#unused-bindings) - [Alternatives considered](#alternatives-considered-1) - - [Compile-time bindings](#compile-time-bindings) - [`auto` and type deduction](#auto-and-type-deduction) - [Alternatives considered](#alternatives-considered-2) - [`var`](#var) @@ -130,14 +129,16 @@ fn F() { A name binding pattern is a pattern. -- _binding-pattern_ ::= _identifier_ `:` _expression_ +- _binding-pattern_ ::= `ref`? (_identifier_ | `self`) `:` _expression_ +- _binding-pattern_ ::= `template`? _identifier_ `:!` _expression_ - _proper-pattern_ ::= _binding-pattern_ A name binding pattern declares a _binding_ with a name specified by the _identifier_, which can be used as an expression. If the binding pattern is -enclosed by a `var` pattern, it is a _reference binding pattern_, and the -binding is a durable reference expression. Otherwise, it is a _value binding -pattern_, and the binding is a value expression. +prefixed with `ref` or enclosed by a `var` pattern, it is a _reference binding +pattern_, and otherwise it is a _value binding pattern_. A binding pattern +enclosed by a `var` pattern cannot have a `ref` prefix, because it would be +redundant. A _variable binding pattern_ is a special kind of reference binding pattern, which is the immediate subpattern of its enclosing `var` pattern. @@ -146,15 +147,36 @@ which is the immediate subpattern of its enclosing `var` pattern. > expected to be the only difference between variable binding patterns and other > reference binding patterns. -The type of the binding is specified by the _expression_. If the pattern is a -value binding pattern, the scrutinee is implicitly converted to a value -expression of that type if necessary, and the binding is _bound_ to the -converted value. If the pattern is a reference binding pattern, the enclosing -`var` pattern will ensure that the scrutinee is already a durable reference -expression with the specified type, and the binding is bound directly to it. - -A use of a value binding is a value expression of the declared type, and a use -of a reference binding is a durable reference expression of the declared type. +If the pattern syntax uses `:` it is a _runtime binding pattern_. If it uses +`:!`, it is a _compile-time binding pattern_, and it cannot appear inside a +`var` pattern. A compile-time binding pattern is either a _symbolic binding +pattern_ or a _template binding pattern_, depending on whether it is prefixed +with `template`. + +The binding declared by a binding pattern has a +[primitive form](values.md#expression-forms) with the following components: + +- The type is _expression_. +- The category is "value" if the pattern is a value binding pattern, "owning + durable reference" if it's a variable binding pattern, or "non-owning + durable reference" if it's a non-variable reference binding pattern. +- The phase is "runtime", "symbolic", or "template" depending on whether the + pattern is a runtime, symbolic, or template binding pattern. + +During pattern matching, the scrutinee is implicitly converted as needed to have +the same form, and then the binding is _bound_ to the result of these +conversions. This makes a runtime or template binding an alias for the converted +scrutinee expression, with the same form and value. Symbolic bindings are more +complex: the binding will have the same type, category, and phase as the +converted scrutinee expression, but its constant value is an opaque symbol +introduced by the binding, which the type system knows to be equal to the +converted scrutinee expression. + +Note that there is no way to implicitly convert to a durable reference +expression from any other category, so the scrutinee of a reference binding +pattern must already be a durable reference. `var` pattern matching ensures that +this is the case for the bindings nested inside it, but for `ref` binding +patterns the user-provided scrutinee must meet this requirement itself. ```carbon fn F() -> i32 { @@ -169,46 +191,27 @@ fn F() -> i32 { } ``` -When a new object needs to be created for the binding, the lifetime of the bound -value matches the scope of the binding. - -```carbon -class NoisyDestructor { - fn Make() -> Self { return {}; } - impl i32 as ImplicitAs(NoisyDestructor) { - fn Convert[me: i32]() -> Self { return Make(); } - } - destructor { - Print("Destroyed!"); - } -} - -fn G() { - // Does not print "Destroyed!". - let n: NoisyDestructor = NoisyDestructor.Make(); - Print("Body of G"); - // Prints "Destroyed!" here. -} - -fn H(n: i32) { - // Does not print "Destroyed!". - let (v: NoisyDestructor, w: i32) = (n, n); - Print("Body of H"); - // Prints "Destroyed!" here. -} -``` +`self` can be used instead of an identifier only if the pattern is an implicit +parameter of a member function (optionally enclosed in a `var` pattern). This +marks the function as a method; during pattern matching, the parameter pattern +containing `self` is matched with the object that the method was invoked on. +Other than that, a `self` pattern behaves just like an ordinary binding pattern, +introducing a binding named `self` into scope, just as if `self` were an +identifier rather than a keyword. #### Unused bindings A syntax like a binding but with `_` in place of an identifier, or `unused` -before the name, can be used to ignore part of a value. Names that are qualified -with the `unused` keyword are visible for name lookup but uses are invalid, -including when they cause ambiguous name lookup errors. If attempted to be used, -a compiler error will be shown to the user, instructing them to either remove -the `unused` qualifier or remove the use. +before the name, [discards](values.md#form-conversions) the scrutinee. Names +that are qualified with the `unused` keyword are visible for name lookup but +uses are invalid, including when they cause ambiguous name lookup errors. If +attempted to be used, a compiler error will be shown to the user, instructing +them to either remove the `unused` qualifier or remove the use. - _binding-pattern_ ::= `_` `:` _expression_ +- _binding-pattern_ ::= `template`? `_` `:!` _expression_ - _binding-pattern_ ::= `unused` _identifier_ `:` _expression_ +- _binding-pattern_ ::= `unused` `template`? _identifier_ `:!` _expression_ ```carbon fn F(n: i32) { @@ -245,27 +248,6 @@ fn J(unused n: i32); - [Anonymous, named identifiers](/proposals/p2022.md#anonymous-named-identifiers) - [Attributes](/proposals/p2022.md#attributes) -#### Compile-time bindings - -A `:!` can be used in place of `:` for a binding that is usable at compile time. - -- _compile-time-pattern_ ::= `template`? _identifier_ `:!` _expression_ -- _compile-time-pattern_ ::= `template`? `_` `:!` _expression_ -- _compile-time-pattern_ ::= `unused` `template`? _identifier_ `:!` - _expression_ -- _proper-pattern_ ::= _compile-time-pattern_ - -```carbon -// ✅ `F` takes a symbolic facet parameter `T` and a parameter `x` of type `T`. -fn F(T:! type, x: T) { - var v: T = x; -} -``` - -The `template` keyword indicates the binding pattern is introducing a template -binding, so name lookups into the binding will not be fully resolved until its -value is known. - #### `auto` and type deduction The `auto` keyword is a placeholder for a unique deduced type. @@ -318,10 +300,15 @@ scrutinee. - _proper-pattern_ ::= `var` _proper-pattern_ -A `var` pattern matches when its nested pattern matches. The type of the storage -is the resolved type of the nested _pattern_. Any binding patterns within the -nested pattern are reference binding patterns, and their bindings refer to -portions of the corresponding storage rather than to the scrutinee. +The scrutinee is expected to have the same type as the resolved type of the +nested _proper-pattern_, and it is expected to be a runtime-phase owning +ephemeral reference expression. The scrutinee expression is converted as needed +to satisfy those expectations, and the `var` pattern takes ownership of the +referenced object, promotes it to an owning _durable_ reference expression, and +matches the nested _proper-pattern_ with it. + +The lifetime of the allocated object extends to the end of scope of the `var` +pattern (that is the scope that any bindings declared within it would have). ```carbon fn F(p: i32*); @@ -358,8 +345,12 @@ A _tuple-pattern_ containing no commas is treated as grouping parens: the contained _proper-pattern_ is matched directly against the scrutinee. Otherwise, the behavior is as follows. -A tuple pattern is matched left-to-right. The scrutinee is required to be of -tuple type. +The scrutinee is required to be of tuple type, with the same arity as the number +of nested _proper-patterns_. It is converted to a tuple form by +[form decomposition](values.md#form-conversions), and then each nested +_proper-pattern_ in left-to-right order is matched against the corresponding +element of the converted scrutinee's [outcome](values.md#expression-forms). The +tuple pattern matches if all of these sub-matches succeed. Note that a tuple pattern must contain at least one _proper-pattern_. Otherwise, it is a tuple-valued expression. However, a tuple pattern and a corresponding @@ -392,10 +383,18 @@ match ({.a = 1, .b = 2}) { } ``` -The scrutinee is required to be of struct type, and to have the same set of -field names as the pattern. The pattern is matched left-to-right, meaning that -matching is performed in the field order specified in the pattern, not in the -field order of the scrutinee. This is consistent with the behavior of matching +The scrutinee is required to be of struct type, and every field name in the +pattern must be a field name in the scrutinee. It is converted to a struct form +by [form decomposition](values.md#form-conversions) and then, for each +subpattern of the struct pattern in left-to-right order, the subpattern is +matched with the same-named element of the converted scrutinee's +[outcome](values.md#expression-forms). If the scrutinee outcome has any field +names not present in the pattern, those sub-outcomes are +[discarded](values.md#form-conversions) in lexical order if the pattern has a +trailing `_` (as in `{.a = 1, _}`), or diagnosed as an error if it does not. The +struct pattern matches if all of these sub-matches succeed. + +Note that the left-to-right order is consistent with the behavior of matching against a struct-valued expression, where the expression pattern becomes the left operand of the `==` and so determines the order in which `==` comparisons for fields are performed. @@ -409,6 +408,9 @@ match ({.a = 1, .b = 2}) { } ``` +Likewise, `ref a: T` is synonymous with `.a = ref a: T`, and `var a: T` is +synonymous with `.a = var a: T`. + If some fields should be ignored when matching, a trailing `, _` can be added to specify this: @@ -717,8 +719,10 @@ In order to match a value, whatever is specified in the pattern must match. Using `auto` for a type will always match, making `_: auto` the wildcard pattern. -Any initializing expressions in the scrutinee of a `match` statement are -[materialized](values.md#temporary-materialization) before pattern matching +If the scrutinee expression's [form](values.md#expression-forms) contains any +primitive forms with category "initializing", they are converted to non-owning +ephemeral reference expressions by +[materialization](values.md#temporary-materialization) before pattern matching begins, so that the result can be reused by multiple `case`s. However, the objects created by `var` patterns are not reused by multiple `case`s: diff --git a/docs/design/tuples.md b/docs/design/tuples.md index 746896e982df1..ee3f31cb474c7 100644 --- a/docs/design/tuples.md +++ b/docs/design/tuples.md @@ -12,6 +12,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Overview](#overview) - [Element access](#element-access) +- [Conversion](#conversion) - [Empty tuples](#empty-tuples) - [Trailing commas and single-element tuples](#trailing-commas-and-single-element-tuples) - [Tuple of types and tuple types](#tuple-of-types-and-tuple-types) @@ -64,6 +65,14 @@ fn Choose(template N:! i32) -> i32 { } ``` +## Conversion + +A tuple type `Source` can be converted to a tuple type `Dest` if they have the +same number of elements, and each element type of `Source` is convertible to the +corresponding element type of `Dest`, and the conversion is implicit if all of +the element type conversions are implicit. See +[here](values.md#type-conversions) for full details. + ### Empty tuples `()` is the empty tuple. This is used in other parts of the design, such as diff --git a/docs/design/values.md b/docs/design/values.md index 4552f6b26e624..3d33e357d6930 100644 --- a/docs/design/values.md +++ b/docs/design/values.md @@ -20,6 +20,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Local variables](#local-variables) - [Consuming function parameters](#consuming-function-parameters) - [Reference expressions](#reference-expressions) + - [Owning reference expressions](#owning-reference-expressions) - [Durable reference expressions](#durable-reference-expressions) - [Ephemeral reference expressions](#ephemeral-reference-expressions) - [Value expressions](#value-expressions) @@ -28,9 +29,14 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Interop with C++ `const &` and `const` methods](#interop-with-c-const--and-const-methods) - [Escape hatches for value addresses in Carbon](#escape-hatches-for-value-addresses-in-carbon) - [Initializing expressions](#initializing-expressions) + - [Initializing outcomes](#initializing-outcomes) - [Function calls and returns](#function-calls-and-returns) - [Deferred initialization from values and references](#deferred-initialization-from-values-and-references) - [Declared `returned` variable](#declared-returned-variable) +- [Expression forms](#expression-forms) + - [Form conversions](#form-conversions) + - [Type conversions](#type-conversions) + - [Category conversions](#category-conversions) - [Pointers](#pointers) - [Reference types](#reference-types) - [Pointer syntax](#pointer-syntax) @@ -64,7 +70,7 @@ itself. ### Expression categories -There are three expression categories in Carbon: +There are three primary expression categories in Carbon: - [_Value expressions_](#value-expressions) produce abstract, read-only _values_ that cannot be modified or have their address taken. @@ -77,8 +83,8 @@ There are three expression categories in Carbon: returns, which can construct the returned value directly in the caller's storage. -Expressions in one category can be converted to any other category when needed. -The primitive conversion steps used are: +Expressions in one category can be implicitly converted to any other primary +category when needed. The primitive conversion steps used are: - [_Value binding_](#value-binding) forms a value expression from the current value of the object referenced by a reference expression. @@ -97,13 +103,30 @@ These conversion steps combine to provide the transitive conversion table: | to **reference** | direct init + materialize | == | materialize | | to **initializing** | direct init | copy init | == | -Reference expressions formed through temporary materialization are called -[_ephemeral reference expressions_](#ephemeral-reference-expressions) and have -restrictions on how they are used. In contrast, reference expressions that refer -to declared storage are called -[_durable reference expressions_](#durable-reference-expressions). Beyond the -restrictions on what is valid, there is no distinction in their behavior or -semantics. +Reference expressions are divided into 2x2 sub-categories: they can be either +[_ephemeral_](#ephemeral-reference-expressions) or +[_durable_](#durable-reference-expressions), and either _owning_ or +_non-owning_. + +Ephemeral reference expressions are formed through temporary materialization, +and have restrictions on how they are used. In contrast, durable reference +expressions refer to storage that outlives the expression, and typically has a +declared name. Owning reference expressions can only refer to complete objects, +whereas non-owning reference expressions can refer to both complete objects and +sub-objects (such as class fields and base class sub-objects). As a consequence, +only owning reference expressions can be destructively moved. + +Value binding and copy initialization can be applied to any reference +expression, but materialization only produces owning ephemeral reference +expressions. An owning reference expression can be implicitly converted to +non-owning; this has no run-time effect because it merely discards static +ownership information. Non-owning reference expressions can only be converted to +owning reference expressions by round-tripping through copy-initialization and +materialization. Non-durable-reference expressions cannot be implicitly +converted to durable reference expressions at all. + +> **TODO:** Determine how these reference sub-categories relate to memory-safety +> properties like uniqueness. #### Value binding @@ -156,9 +179,12 @@ fn Sum(x: i32, y: i32) -> i32 { Value bindings require the matched expression to be a _value expression_, converting it into one as necessary. -A _variable pattern_ is introduced with the `var` keyword. It declares storage -for a new object, and initializes it from the matched expression, which must be -an initializing expression. +A _variable pattern_ is introduced with the `var` keyword. The matched +expression must be an ephemeral owning reference expression (which typically +requires the matched expression to be materialized); the `var` pattern "adopts" +the temporary storage it refers to, which extends its lifetime to the end of the +enclosing scope. The subpattern is then matched against a _durable_ owning +reference expression to the object in that storage. A _reference binding pattern_ is a binding pattern that is nested under a `var` pattern. It introduces a name called a _reference binding_ that is a @@ -173,8 +199,10 @@ fn Example() { let x: i64 = 1; // `2` also starts as a value expression, but the variable pattern requires it - // to be converted to an initializing expression by using the value `2` to - // initialize the provided variable storage that `y` will refer to. + // to be converted to an owning ephemeral reference expression by using the + // value `2` to initialize temporary storage, which the variable pattern + // adopts. The reference binding pattern is then bound to a durable reference + // to the newly-initialized object. var y: i64 = 2; // Allowed to take the address and mutate `y` as it is a durable reference @@ -213,8 +241,10 @@ inner `var` pattern here: fn DestructuringExample() { // Both `1` and `2` start as value expressions. The `x` binding directly // matches `1`. For `2`, the variable pattern requires it to be converted to - // an initializing expression by using the value `2` to initialize the - // provided variable storage that `y` will refer to. + // an owning ephemeral reference expression by using the value `2` to + // initialize temporary storage, which the variable pattern adopts. + // The reference binding `y` is then bound to a durable reference to the + // newly-initialized object. let (x: i64, var y: i64) = (1, 2); // Just like above, we can take the address and mutate `y`: @@ -249,9 +279,9 @@ This allows us to model an important special case of function inputs -- those that are _consumed_ by the function, either through local processing or being moved into some persistent storage. Marking these in the pattern and thus signature of the function changes the expression category required for arguments -in the caller. These arguments are required to be _initializing expressions_, -potentially being converted into such an expression if necessary, that directly -initialize storage dedicated-to and owned-by the function parameter. +in the caller. These arguments are required to be _owning ephemeral reference +expressions_, potentially being converted into such an expression if necessary, +whose storage will be dedicated-to and owned-by the function parameter. This pattern serves the same purpose as C++'s pass-by-value when used with types that have non-trivial resources attached to pass ownership into the function and @@ -263,14 +293,43 @@ makes this a use case that requires a special marking on the declaration. _Reference expressions_ refer to _objects_ with _storage_ where a value may be read or written and the object's address can be taken. -Calling a [method](/docs/design/classes.md#methods) on a reference expression -where the method's `self` parameter has an `addr` specifier can always -implicitly take the address of the referred-to object. This address is passed as -a [pointer](#pointers) to the `self` parameter for such methods. +Reference expressions can be either _durable_ or _ephemeral_. These refine the +_lifetime_ of the underlying storage and provide safety restrictions reflecting +that lifetime. Reference expressions can also be either _owning_ or +_non-owning_, depending on whether the referenced object is known to be complete +(rather than a sub-object of another object). -There are two sub-categories of reference expressions: _durable_ and -_ephemeral_. These refine the _lifetime_ of the underlying storage and provide -safety restrictions reflecting that lifetime. +### Owning reference expressions + +An _owning reference expression_ is one that is statically known to refer to a +complete object. Other references are _non-owning_. Durable and ephemeral +reference expressions can both be either owning or non-owning. An owning +reference can be implicitly converted to a non-owning reference (with the same +durability), because this merely discards the knowledge that the object is +complete. + +Any context that accepts a reference expression can accept an owning reference +expression, and unless otherwise specified it can accept non-owning references +as well. Unless otherwise specified, an expression or operation that produces a +reference produces a non-owning reference. + +Currently, the only context that requires an owning reference is the scrutinee +of a `var` pattern, which must be an owning ephemeral reference. + +There is only one kind of explicit expression that produces an owning reference: +the name of an object introduced with a +[variable binding pattern](pattern_matching.md#name-binding-patterns) (in other +words, a name that was declared with `var : `) is an owning durable +reference. + +Two kinds of implicit expression can also produce owning references: + +- The result of materialization is an owning ephemeral reference. +- When a [tuple pattern](pattern_matching.md#tuple-patterns) or + [struct pattern](pattern_matching.md#struct-patterns) is matched with an + owning ephemeral reference scrutinee, that scrutinee is destructured into + owning ephemeral references to its elements, which are then matched with the + corresponding subpatterns. ### Durable reference expressions @@ -278,7 +337,7 @@ _Durable reference expressions_ are those where the object's storage outlives the full expression and the address could be meaningfully propagated out of it as well. -There are two contexts that require a durable reference expression in Carbon: +There are four contexts that require a durable reference expression in Carbon: - [Assignment statements](/docs/design/assignment.md) require the left-hand-side of the `=` to be a durable reference. This stronger @@ -286,6 +345,11 @@ There are two contexts that require a durable reference expression in Carbon: the `Carbon.Assign.Op` interface method. - [Address-of expressions](#pointer-syntax) require their operand to be a durable reference and compute the address of the referenced object. +- [non-`self` `ref` binding patterns](pattern_matching.md#name-binding-patterns) + require their scrutinee to be a durable reference. +- If a function's [return form](#function-calls-and-returns) contains `ref` + tags, `return` statements require the corresponding parts of the operand to + be durable reference expressions. There are several kinds of expressions that produce durable references in Carbon: @@ -299,6 +363,8 @@ Carbon: - [Indexing](/docs/design/expressions/indexing.md) into a type similar to C++'s `std::span` that implements `IndirectIndexWith`, or indexing into any type with a durable reference expression such as `local_array[i]`. +- Calls to functions whose [return forms](#function-calls-and-returns) contain + `ref`. Durable reference expressions can only be produced _directly_ by one of these expressions. They are never produced by converting one of the other expression @@ -312,19 +378,26 @@ expressions_. They still refer to an object with storage, but it may be storage that will not outlive the full expression. Because the storage is only temporary, we impose restrictions on where these reference expressions can be used: their address can only be taken implicitly as part of a method call whose -`self` parameter is marked with the `addr` specifier. - -**Future work:** The current design allows directly requiring an ephemeral -reference for `addr`-methods because this replicates the flexibility in C++ -- -very few C++ methods are L-value-ref-qualified which would have a similar effect -to `addr`-methods requiring a durable reference expression. This is leveraged -frequently in C++ for builder APIs and other patterns. However, Carbon provides -more tools in this space than C++ already, and so it may be worth evaluating -whether we can switch `addr`-methods to the same restrictions as assignment and -`&`. Temporaries would never have their address escaped (in a safe way) in that -world and there would be fewer different kinds of entities. But this is reserved -for future work as we should be very careful about the expressivity hit being -tolerable both for native-Carbon API design and for migrated C++ code. +`self` parameter is marked with the `ref` specifier. + +> **Future work:** The current design does not support mutating ephemeral +> references (or initializing expressions): assigning to an ephemeral reference +> is disallowed directly, and invoking mutating methods is disallowed because +> the `ref self` parameter can only bind to a durable reference. In C++ it's +> unusual but not rare to intentionally mutate a temporary, such as in a +> builder-style method chain (for example `MakeFoo().SetBar().AddBaz()`), so +> Carbon will need to provide some interop and migration target for that kind of +> code. + +There is one context that requires an owning ephemeral reference expression in +Carbon: the scrutinee of a +[`var` pattern](#binding-patterns-and-local-variables-with-let-and-var). There +is no context that requires a non-owning ephemeral reference expression. + +There is only one kind of explicit expression that produces an ephemeral +reference: a member access expression `x.member` or `x.(member)`, where `x` is +an initializing or ephemeral reference expression. Ephemeral reference +expressions can also arise implicitly, as the result of materialization. ## Value expressions @@ -473,7 +546,7 @@ to the operation in question. For example: ```carbon class S { fn ValueMemberFunction[self: Self](); - fn AddrMemberFunction[addr self: const Self*](); + fn RefMemberFunction[ref self: const Self](); } fn F(s_value: S) { @@ -481,7 +554,7 @@ fn F(s_value: S) { s_value.ValueMemberFunction(); // This requires an unsafe marker in the syntax. - s_value.unsafe AddrMemberFunction(); + s_value.unsafe RefMemberFunction(); } ``` @@ -510,29 +583,54 @@ the provided storage. **Future work:** The design should be expanded to fully cover how copying is managed and linked to from here. -The first place where an initializing expression is _required_ is to satisfy -[_variable patterns_](#binding-patterns-and-local-variables-with-let-and-var). -These require the expression they match to be an initializing expression for the -storage they create. The simplest example is the expression after the `=` in a -local `var` declaration. +There are no syntactic contexts in Carbon that always require an initializing +expression, and no expression syntax that always produces an initializing +expression. By default, function call expressions are initializing expressions, +and correspondingly the operand of `return` is required to be an initializing +expression, but this default can be overridden by the +[function signature](#function-calls-and-returns). + +Initializing expressions can also be created implicitly, when attempting to +convert an expression into an ephemeral owning reference expression +(particularly to match a `var` pattern): the expression is first converted to an +initializing expression if necessary, and then temporary storage is materialized +to act as its output, and as the referent of the resulting ephemeral reference +expression. + +### Initializing outcomes -The next place where a Carbon expression requires an initializing expression is -the expression operand to `return` statements. We expand more completely on how -return statements interact with expressions, values, objects, and storage -[below](#function-calls-and-returns). +An _initializing outcome_ is the notional result of evaluating an initializing +expression, and represents an obligation to provide storage for an object of the +expression's type. This obligation can be fulfilled by allocating suitable +storage and materializing the initializing outcome into it, or it can be +delegated by returning it to some enclosing context, where it acts as an +initializing expression. -The last path that requires forming an initializing expression in Carbon is when -attempting to convert a non-reference expression into an ephemeral reference -expression: the expression is first converted to an initializing expression if -necessary, and then temporary storage is materialized to act as its output, and -as the referent of the resulting ephemeral reference expression. +This delegation only happens in a few local contexts whose semantics are defined +by the core language, such as forming a tuple or struct literal from its +elements, or [converting between composite forms](#form-conversions), where the +generated code can compute the storage location beforehand, and use it as a +hidden output parameter when evaluating the initializing expression. The +initializing outcome abstracts away that hidden output parameter and lets us use +the conventional vocabulary of expression evaluation, where information flows +into an operation from its operands and not the other way around. ### Function calls and returns -Function calls in Carbon are modeled directly as initializing expressions -- -they require storage as an input and when evaluated cause that storage to be -initialized with an object. This means that when a function call is used to -initialize some variable pattern as here: +The outcome of a function call can have an almost arbitrary form. The return +clause of a function signature consists of `->` followed by a _return form_, an +expression-like syntax that specifies not only the type but also the form of the +function call's outcome. `return` expressions in the function body are expected +to have that form, and are converted to it if necessary. When a function is +declared without a return clause, it behaves from the caller's point of view as +if the return clause were `-> ()`, but `return` statements in the function body +don't take operands (and can be omitted at the end of the function). + +In the common case, the return form is a type expression, in which case calls +are modeled directly as initializing expressions -- they require storage as an +input and when evaluated cause that storage to be initialized with an object. +This means that when a function call is used to initialize some variable pattern +as here: ```carbon fn CreateMyObject() -> MyType { @@ -545,18 +643,64 @@ var x: MyType = CreateMyObject(); The `` in the `return` statement actually initializes the storage provided for `x`. There is no "copy" or other step. -> **Future work:** Extend this to also apply when a variable pattern is -> initialized from a tuple/struct literal, or a tuple/struct pattern with -> variable subpatterns is initialized from a single function call. +In the body of such a function, all `return` statement expressions are required +to be initializing expressions and in fact initialize the storage provided to +the function's call expression. This in turn causes the property to hold +_transitively_ across an arbitrary number of function calls and returns. The +storage is forwarded at each stage and initialized exactly once. + +More generally, the syntax and semantics of a return form are as follows: + +- _return-clause_ ::= `->` _return-form_ +- _return-form_ ::= _nesting-return-form_ | _auto-return-form_ +- _nesting-return-form_ ::= _expression-return-form_ | _proper-return-form_ + +Return forms can usually be nested, but syntaxes involving `auto` can only occur +at top level. We further divide nesting return forms into expressions and +"proper" return forms, but this is just a technical means of avoiding formal +ambiguity in the grammar; it has no greater significance. + +- _category-tag_ ::= `val` | `ref` | `var` -All `return` statement expressions are required to be initializing expressions -and in fact initialize the storage provided to the function's call expression. -This in turn causes the property to hold _transitively_ across an arbitrary -number of function calls and returns. The storage is forwarded at each stage and -initialized exactly once. +These tags are used to specify "value", "non-owning durable reference", or +"initializing" expression category (respectively). Note that there is no way to +express an owning or ephemeral reference category in a return form. -Note that functions without a specified return type work exactly the same as -functions with a `()` return type for the purpose of expression categories. +- _auto-return-form_ ::= _category-tag_? `auto` + +This denotes a primitive form with runtime phase and deduced type. The category +is determined by _category-tag_ if present, or "initializing" otherwise. + +- _proper-return-form_ ::= _category-tag_ _expression_ + +This denotes a primitive form with runtime phase, category _category-tag_, and +type "_expression_ `as type`". + +- _expression-return-form_ ::= _expression_ + +An expression with no _category-tag_ is equivalent to "`var` _expression_". + +- _proper-return-form_ ::= `(` [_expression-return-form_ `,`]\* _proper-return-form_ + [`,` _nesting-return-form_]\* `,`? `)` + +A tuple literal of return forms denotes a tuple form whose sub-forms are +specified by the comma-separated elements. To avoid formal ambiguity, this +grammar rule requires at least one of the sub-forms to be proper. + +- _expression-field-form_ ::= _designator_ `:` _expression-return-form_ +- _proper-field-form_ ::= _designator_ `:` _proper-return-form_ +- _field-form_ ::= _field-decl_ +- _field-form_ ::= _proper-field-form_ +- _proper-return-form_ ::= `{` [_expression-field-form_ `,`]\* _proper-field-form_ + [`,` _field-form_]\* `}` + +A struct literal of return forms denotes a struct form whose field names and +their forms are specified by the comma-separated field forms. To avoid formal +ambiguity, this grammar rule requires at least one of the field forms to be +proper. + +> **Open question:** Should there be a way to specify symbolic or template phase +> in return forms? #### Deferred initialization from values and references @@ -629,6 +773,233 @@ The model of initialization of returns also facilitates the use of [`returned var` declarations](control_flow/return.md#returned-var). These directly observe the storage provided for initialization of a function's return. +## Expression forms + +We typically treat the category and type of an expression as independent +properties. However, in some cases we need deal with them as an integrated +whole. The _form_ of an expression captures all of the information about it that +is visible to the type system, while abstracting away all other information +about it. Thus, forms are a generalization of types: what we conventionally call +"types" are really the types of objects and values, whereas forms are the types +of expressions and patterns. + +A _primitive form_ currently consists of a type, an expression category, an +expression phase, and optionally a constant value (which is present if and only +if the expression phase is not "runtime"). When dealing with primitive forms, +which is the common case, we can treat each of those properties as independent. +For convenience, in this section we will use the notation `[T, C, P, V]` to +represent a primitive form with type `T`, category `C`, phase `P` and value `V`, +but this is not Carbon syntax. + +Other forms are called _composite forms_, and there are two kinds: + +A _tuple form_ can be thought of as a tuple of forms, just as a tuple type can +be thought of as a tuple of types. The form of a tuple literal is a tuple form, +whose elements are the forms of the literal elements. + +> **TODO:** Extend this to support variadic forms. + +A _struct form_ can be thought of as a struct whose fields are forms, just as a +struct type can be thought of as a struct whose fields are types. The form of a +struct literal is a struct form with the same field names, whose values are the +forms of the corresponding fields of the struct literal. + +The _type component_ of a form is defined as follows: + +- The type component of a primitive form `[T, C, P, V]` is `T`. +- The type component of a tuple form is a tuple of the type components of its + elements. +- The type component of a struct form is a struct whose field names are the + field names of the struct form and whose field types are the type components + of the corresponding elements. + +The _category component_ and _phase component_ of a form are defined likewise. +The category component of a struct form is called a _struct category_, and the +category component of a tuple form is called a tuple category. + +The type of an expression is the type component of the expression's form. + +An _outcome_ is the result of evaluating an expression. It can be defined +recursively in terms of the expression's form: + +- The outcome of an initializing expression is an initializing outcome. +- The outcome of a value expression is a value. +- The outcome of a reference expression is a reference of the same kind. +- The outcome of an expression with tuple form is a tuple of outcomes. +- The outcome of an expression with struct form is a struct of outcomes. + +An expression and its outcome always have the same form. + +### Form conversions + +A conversion between forms can be broken down into up to three steps: type +conversion, category conversion, and phase conversion. These convert the form to +a particular target type, category, and phase component (respectively). These +steps aren't fully orthogonal: type conversions can change the category and +phase components as a byproduct, and category conversions can change the phase +component. However, category conversions can't change the type component, and +phase conversions can't change either of the other two, so converting the type, +then category, then phase, ensures that we converge on the desired result. + +Any of these steps may be omitted, depending on whether the context imposes +requirements on the corresponding component. Most commonly, an operand position +requires its operand to have a primitive form with a particular category, +usually with a particular type, and sometimes with a particular phase. + +In some cases an expression's outcome is _discarded_, such as when the +expression is used as a statement, or is matched with an +[unused binding pattern](pattern_matching.md#unused-bindings). Discarding an +outcome is a form conversion that does nothing except materialize any +initializing sub-outcomes, in order to satisfy the requirement that every +initializing outcome is materialized. + +Phase conversions are straightforward, because they cannot change the form +structure; they can only apply primitive phase conversions to primitive +sub-forms. Type and category conversions are more complex, and are covered in +the next two sections. + +#### Type conversions + +See [here](expressions/implicit_conversions.md) for overall information about +type conversions. Conversions involving struct, tuple, and array types are +described here because of their unique interactions with expression forms. + +> **TODO:** A forthcoming proposal is expected to update the type conversion +> interfaces to permit user-defined conversions to depend on the form of the +> input, and customize the form of the output. Once that is done, these "built +> in" conversions should be presented as implementations of those interfaces, +> possibly with some "magic" for things like introspecting on struct field +> names. + +Each of the conversions described in this section is explicit if and only if it +invokes another explicit type conversion. Otherwise, it is implicit. + +An outcome `source` that has a struct type can be converted to a struct type +`Dest` if they have the same set of field names: + +- If the type of `source` is `Dest`, return `source`. +- If `source` is a struct outcome, for each field name `F` in `Dest`, in + `Dest`'s field order, type-convert `source.F` to `Dest.F`. Return a struct + outcome where each field `F` is set to the outcome of the corresponding + conversion. +- If `source` is a primitive outcome, convert it to a struct outcome by + [form decomposition](#category-conversions), type-convert the outcome to + `Dest`, category-convert the outcome to an initializing expression, and + return the result. + +Note that the sub-conversions invoked here are not necessarily defined; if so, +the conversion itself is not defined. + +The conversion to an initializing outcome in the last case is not formally +necessary; its purpose is to ensure that the result of type conversion is not +"less primitive" than the source form. Allowing conversions to add form +structure that wasn't originally present would have surprising consequences. For +example, if we have `fn F() -> (i32, i32)`, then `var a: array(i32, 2) = F();` +is not valid because `F()` does not have a tuple form. That being the case, it +would be surprising if `var a: array(i32, 2) = F() as (i16, i16);` were valid, +so `F() as (i16, i16)` must not have a tuple form. + +**Open question:** We've chosen "initializing" as the default category for +primitive sub-forms in a conversion, but in some cases "value" could be more +efficient. Do we want a way of explicitly requesting conversion to a given form, +rather than just a given type, in order to override this default when it's +inefficient? + +There is a conversion to a class type `Dest` from an outcome `source` that has a +struct type, if there is a conversion from `source` to a struct type that has +the same field names as `Dest`, with the same types, in the same order. The +conversion type-converts `source` to that struct type, category-converts that to +an initializing expression of the struct type, and then reinterprets it as an +initializing expression of `Dest` (which is layout-compatible with the struct +type by construction). + +Note that some fields of an object may be initialized directly by the evaluation +of the source expression, while others may be initialized by the conversions +described here. The conversions initialize fields in their declaration order, +but the evaluation of the source expression always happens before any of the +conversions, and happens in the source expression's lexical order, so the fields +of an object are not necessarily initialized in declaration order. + +Conversions between tuple types are defined in the same way, treating tuples as +structs that have fields named `.0`, `.1`, etc, in numerical order. + +There is a conversion to `array(T, N)` from any expression with a tuple form of +exactly `N` elements, whose type components are convertible to `T`. The +conversion is an initializing expression, which type-converts each source +element to `T`, and initializes the corresponding array element from the result +of that conversion. + +#### Category conversions + +_Form composition_ converts a composite form with consistent category to a +primitive form as follows (where `min` as applied to phases uses the ordering +"runtime" < "symbolic" < "template"): + +- A tuple form `([T1, C, P1, V1], [T2, C, P2, V2], ... [TN, C, PN, VN])` can + be converted to a primitive form + `[(T1, T2, ..., TN), C, min(P1, P2, ..., PN), (V1, V2, ... VN)]`. +- A struct form + `{.a = [Ta, C, Pa, Va], .b = [Tb, C, Pb, Vb], ... .z = [Tz, C, Pz, Vz]}` can + be converted to a primitive form + `[{.a = Ta, .b = Tb, ... .z = Tz}, C, min(Pa, Pb, ... Pz), {.a = Va, .b = Vb, ... .z = Vz}]`. + +When `C` is "value", composition forms a value representation of the aggregate +from value representations of the elements. When `C` is "initializing", it +transforms initializing expressions for each element into a single initializing +expression that initializes the whole aggregate. `C` cannot be a reference +category, because an aggregate of references to independent objects can't be +replaced by a reference to a single aggregate object in a single step. + +_Category conversion_ converts a form to have a given category component without +changing its type, so long as the target category component is not "less +primitive" than the source form. The conversion works by combining form +composition with primitive category conversions, and is defined recursively: + +- If the target category component is a tuple, the source form must be a tuple + form of the same arity. Category-convert each source sub-form to the + corresponding target sub-category. +- If the target category component is a struct, the source form must be a + struct form with the same set of field names in the same order. + Category-convert each source sub-form to the corresponding target + sub-category. +- If the target category is a primitive category `C`: + - If the source form is primitive, convert to `C` by applying primitive + category conversions. + - If the source form is composite and `C` is a reference category, + category-convert the source form to "initializing", and then convert the + result to `C` by applying primitive category conversions. + - If the source form is composite and `C` is not a reference category, + category-convert each source sub-form to `C`, and then convert the + aggregate result of these conversions to `C` by form composition. + +_Form decomposition_ is the inverse of form composition. It converts a primitive +form to a composite form as follows: + +- A primitive form `[(T1, T2, ..., TN), C, P, V]` can be converted to a tuple + form `([T1, CC, P, V.1], [T2, CC, P, V.2], ... [TN, CC, P, V.(N)])`. +- A primitive form `[{.a = Ta, .b = Tb, ... .z = Tz}, C, P, V]` can be + converted to a struct form + `{.a = [Ta, CC, P, V.a], .b = [Tb, CC, P, V.b], ... .z = [Tz, CC, P, V.z]}`. + +The category `CC` of the resulting sub-forms is the same as `C`, with two +exceptions: + +- If `C` is "owning durable reference", `CC` will be "non-owning durable + reference", because the sub-forms don't refer to complete objects. This + doesn't apply to owning ephemeral references, because in that case form + decomposition implicitly ends the lifetime of the original aggregate, + promoting its elements to complete objects with independent lifetimes. +- If `C` is "initializing", `CC` will be "owning ephemeral reference", because + the initializing outcome must be materialized before it can be decomposed. + +By convention, form decomposition is a no-op when applied to a struct or tuple +form. + +Form decomposition occurs only where explicitly specified, because it adds +structural information that may not have been originally present, so it is +applied only in narrow contexts where that added information will be used +safely. + ## Pointers Pointers in Carbon are the primary mechanism for _indirect access_ to storage @@ -794,7 +1165,7 @@ _thread-safe_ interface subset of an otherwise _thread-compatible_ type. Note that `const T` is a type qualification and is generally orthogonal to expression categories or what form of pattern is used, including for object -parameters. Notionally, it can occur both with `addr` and value object +parameters. Notionally, it can occur both with `ref` and value object parameters. However, on value patterns, it is redundant as there is no meaningful distinction between a value expression of type `T` and type `const T`. For example, given a type and methods: @@ -803,20 +1174,20 @@ meaningful distinction between a value expression of type `T` and type class X { fn Method[self: Self](); fn ConstMethod[self: const Self](); - fn AddrMethod[addr self: Self*](); - fn AddrConstMethod[addr self: const Self*](); + fn RefMethod[ref self: Self](); + fn RefConstMethod[ref self: const Self](); } ``` The methods can be called on different kinds of expressions according to the following table: -| Expression category: | `let x: X`
(value) | `let x: const X`
(const value) | `var x: X`
(reference) | `var x: const X`
(const reference) | -| --------------------: | ------------------------ | ------------------------------------ | ---------------------------- | ---------------------------------------- | -| `x.Method();` | ✅ | ✅ | ✅ | ✅ | -| `x.ConstMethod();` | ✅ | ✅ | ✅ | ✅ | -| `x.AddrMethod();` | ❌ | ❌ | ✅ | ❌ | -| `x.AddrConstMethod()` | ❌ | ❌ | ✅ | ✅ | +| Expression category: | `let x: X`
(value) | `let x: const X`
(const value) | `var x: X`
(reference) | `var x: const X`
(const reference) | +| -------------------: | ------------------------ | ------------------------------------ | ---------------------------- | ---------------------------------------- | +| `x.Method();` | ✅ | ✅ | ✅ | ✅ | +| `x.ConstMethod();` | ✅ | ✅ | ✅ | ✅ | +| `x.RefMethod();` | ❌ | ❌ | ✅ | ❌ | +| `x.RefConstMethod()` | ❌ | ❌ | ✅ | ✅ | The `const T` type has the same representation as `T` with the same field names, but all of its field types are also `const`-qualified. Other than fields, all @@ -859,12 +1230,11 @@ and realistic Carbon code patterns that cannot be expressed with the tools in this proposal in order to motivate a minimal extension. Some candidates based on functionality already proposed here or for [classes](/docs/design/classes.md): -- Allow overloading between `addr me` and `me` in methods. This is among the - most appealing as it _doesn't_ have the combinatorial explosion. But it is - also very limited as it only applies to the implicit object parameter. +- Allow overloading between `ref self` and `self` in methods. This is among + the most appealing as it _doesn't_ have the combinatorial explosion. But it + is also very limited as it only applies to the implicit object parameter. - Allow overloading between `var` and non-`var` parameters. -- Expand the `addr` technique from object parameters to all parameters, and - allow overloading based on it. +- Allow overloading between `ref` and non-`ref` parameters in general. Perhaps more options will emerge as well. Again, the goal isn't to completely preclude pursuing this direction, but instead to try to ensure it is only @@ -927,7 +1297,7 @@ will require that the type containing that specifier satisfies the constraint ```carbon interface ReferenceImplicitAs { let T:! type; - fn Convert[addr self: const Self*]() -> T; + fn Convert[ref self: const Self]() -> T; } ``` @@ -972,11 +1342,11 @@ class String { private var capacity: i64; impl as ReferenceImplicitAs where .T = StringView { - fn Op[addr self: const Self*]() -> StringView { + fn Op[ref self: const Self]() -> StringView { // Because this is called on the String object prior to it becoming // a value, we can access an SSO buffer or other interior pointers // of `self`. - return StringView.Make(self->data_ptr, self->size); + return StringView.Make(self.data_ptr, self.size); } } @@ -998,8 +1368,8 @@ class String { // Note that even though the `Self` type is `const` qualified here, this // cannot be called on a `String` value! That would require us to convert to a // `StringView` that does not track the extra data member. - fn Capacity[addr self: const Self*]() -> i64 { - return self->capacity; + fn Capacity[ref self: const Self]() -> i64 { + return self.capacity; } } ``` @@ -1044,6 +1414,9 @@ itself. - [Exclusively using references](/proposals/p2006.md#exclusively-using-references) - [Alternative pointer syntaxes](/proposals/p2006.md#alternative-pointer-syntaxes) - [Alternative syntaxes for locals](/proposals/p2006.md#alternative-syntaxes-for-locals) +- [Mixed expression categories](/proposals/p5545.md#mixed-expression-categories) +- [Use composite forms in more or fewer places](/proposals/p5545.md#use-composite-forms-in-more-or-fewer-places) +- [Materialize temporaries to preserve struct initialization order](/proposals/p5545.md#materialize-temporaries-to-preserve-struct-initialization-order) ## References @@ -1052,9 +1425,11 @@ itself. - [Proposal #618: `var` ordering][p0618] - [Proposal #851: auto keyword for vars][p0851] - [Proposal #2006: Values, variables, and pointers][p2006] +- [Proposal #5545: Expression form basics][p5545] [p0257]: /proposals/p0257.md [p0339]: /proposals/p0339.md [p0618]: /proposals/p0618.md [p0851]: /proposals/p0851.md [p2006]: /proposals/p2006.md +[p5545]: /proposals/p5545.md diff --git a/proposals/p5545.md b/proposals/p5545.md new file mode 100644 index 0000000000000..c4c9581244fb7 --- /dev/null +++ b/proposals/p5545.md @@ -0,0 +1,445 @@ +# Expression form basics + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/5545) + + + +## Table of contents + +- [Abstract](#abstract) +- [Problem](#problem) +- [Background](#background) +- [Proposal](#proposal) +- [Details](#details) +- [Rationale](#rationale) +- [Alternatives considered](#alternatives-considered) + - [Mixed expression categories](#mixed-expression-categories) + - [Use composite forms in more or fewer places](#use-composite-forms-in-more-or-fewer-places) + - [Use the forms that operations naturally produce](#use-the-forms-that-operations-naturally-produce) + - [Smarter choice of primitive category](#smarter-choice-of-primitive-category) + - [Only literals have composite form](#only-literals-have-composite-form) + - [Materialize temporaries to preserve struct initialization order](#materialize-temporaries-to-preserve-struct-initialization-order) + - [Support binding `ref self` to ephemeral references](#support-binding-ref-self-to-ephemeral-references) + + + +## Abstract + +This proposal introduces the concept of a _form_, which is a generalization of +"type" that encompasses all of the information about an expression that's +visible to the type system, including type and expression category. Forms can be +composed into _tuple forms_ and _struct forms_, which lets us track the +categories of individual tuple and struct literal elements. + +## Problem + +It's unclear what expression category tuple and struct literals should have. For +example, this code can only compile if the tuple literal is an initializing +expression: + +```carbon +var t: (NonMovable, NonMovable) = (MakeNonMovable(), MakeNonMovable()) +``` + +But this code can only compile if the tuple literal is a value expression: + +```carbon +let x: NonCopyable = MakeNonCopyable(); +let t: (NonCopyable, NonCopyable) = (x, MakeNonCopyable()); +``` + +And there's plausible code that can't compile if the tuple literal has _any_ +single expression category: + +```carbon +let x: NonCopyable = MakeNonCopyable(); +let (a: NonCopyable, var b: NonMovable) = (x, MakeNonMovable()); +``` + +At present it's always possible to rewrite examples like that to avoid the +problem by disaggregating the tuple patterns into separate statements. However, +when the copy and move operations in question are expensive rather than outright +disabled, those examples will result in silent inefficiency rather than a noisy +build failure, which is less harmful but easier to overlook. + +## Background + +The Carbon toolchain already implements a solution to this problem: it treats +tuple and struct literals as having a "mixed" expression category, and when +individual elements of the literal are accessed (such as during pattern +matching), the element's original category is propagated. + +Proposal [#5434](https://github.com/carbon-language/carbon-lang/pull/5434) +introduces plausible use cases that cannot compile if we assign any single +expression category to a tuple or struct literal, and there is no way to avoid +the problem by rewriting. For example: + +```carbon +fn F() -> (ref NonCopyable, NonMovable); +let (a: NonCopyable, var b: NonMovable) = F(); +``` + +## Proposal + +This proposal solves that problem by introducing the concept of a _form_, which +is a generalization of "type" that encompasses all of the information about an +expression that's visible to the type system, including type and expression +category. In the common case, an expression has a _primitive form_ which +consists of a type, an expression category, and a few other properties. However, +a tuple literal has a _tuple form_, which is a tuple of the forms of its +elements. This allows us to directly represent the fact that different elements +have different categories, and propagate that difference into operations that +access those elements. + +In order to help describe the semantics of forms, this proposal also introduces +the concept of an _outcome_, which is the result of evaluating an expression +with an arbitrary form. Outcomes are a generalization of values and references +in the same way that forms are a generalization of types. In this proposal they +are primarily a descriptive convenience, but they are also intended to function +as the thing that a form-generic binding binds to, when that is proposed. + +The outcomes of initializing expressions, called _initializing outcomes_, have +somewhat subtle semantics. Outcomes present an idealized model of expression +evaluation where information flows from each expression to the context where it +is used, but initializing expressions require information to flow in both +directions: the context supplies a storage location, and then the expression +supplies the contents of that storage location. We finesse this "impedance +mismatch" by saying that the initializing outcome represents an obligation on +the context to supply a storage location (somewhat like a callback or +`std::promise`), which it must fulfill by either materializing or transferring +the outcome. Furthermore, even though this formally happens after the expression +is evaluated, it is constrained in such a way that it can actually be computed +beforehand and passed to the expression's hidden output parameter. + +Finally, this proposal splits the reference expression categories into _owning_ +and _non-owning_ references, where an owning reference is known to refer to a +complete object. This lets us decouple materialization (which now produces an +owning ephemeral reference) from `var` binding (which now expects an owning +ephemeral reference), which lets us resolve a TODO to allow an initializing +expression to be destructured into multiple `var` bindings. + +In the process of doing this, it became clear that the special case that allowed +`ref self` patterns to match ephemeral references was not internally consistent, +so that special case has been removed. We will need some way of supporting the +use cases that were intended to be covered by that rule, but that is being left +as future work. + +## Details + +See the edits in the +[pull request](https://github.com/carbon-language/carbon-lang/pull/5545) +associated with this proposal, particularly in `values.md`. + +## Rationale + +This proposal supports +[performance-critical software](/docs/project/goals.md#performance-critical-software) +and making +[code easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) +by ensuring that tuple and struct literals don't introduce unnecessary category +conversions (which may cause build failures and performance overhead). + +## Alternatives considered + +### Mixed expression categories + +This proposal models an expression like `(x, MakeNonMovable())` from the earlier +example as having a tuple form consisting of primitive forms with `NonCopyable` +and `NonMovable` types (respectively) and "value" and "initializing" categories +(respectively). We could instead support composite expression categories, so +that it has type `(NonCopyable, NonMovable)` and expression category +`(value, initializing)`. + +This would avoid the need to introduce the concept of "form", and preserve the +existing separation between types and categories. That separation would be +somewhat superficial (for example, an expression couldn't have a tuple +expression category if it doesn't have a tuple type), but no more so than the +separation between types and values. + +However, we expect to need an explicit syntax to express these properties of +expressions, for example to define functions that return tuples whose elements +have different categories. A syntax consisting of separate type and category +tuples will be much less ergonomic, and much easier to misuse, than a syntax +that combines both in a single tuple (for example, +[#5434](https://github.com/carbon-language/carbon-lang/pull/5434) represents the +form of `(x, MakeNonMovable())` as `(val NonCopyable, NonMovable)`). + +Furthermore, we anticipate needing to support code that is generic with respect +to forms, not just with respect to types. We plan to achieve that with +parameters of a special "form" type together with ways of deducing and using +them. It might be possible to instead support category parameters that are +deduced and used in conjunction with type parameters, but that would be +syntactically onerous, and oblige users to keep each category correctly paired +with the corresponding type, in order to bring them together at the point of +use. Those challenges will be further compounded when/if it becomes possible to +manipulate types and categories by way of metaprogramming. + +Given that we need to present forms as an integrated whole at the syntactic and +metaprogramming levels, there is very little to be gained by decoupling them at +the level of language semantics. + +### Use composite forms in more or fewer places + +Consider the following possible ways of initializing an array, where there is an +implicit conversion from integer literals to `X`: + +```carbon +let a: array(X, 3) = (1, 2, 3); + +let x_tuple: (X, X, X) = (1, 2, 3); +let b: array(X, 3) = x_tuple; +``` + +`a` is the canonical way of initializing an array from a list of element values, +and this proposal supports it. However, under this proposal, `b` does not +compile, because a tuple-type expression can only initialize an array if it has +tuple form, and `x_tuple` has primitive form. + +Some people find this restriction counterintuitive, and there is no technical +reason for it; in fact, removing it would somewhat simplify the conversion +rules. However, other people's intuition is that a tuple is different enough +from an array that there should not be implicit conversions between them in +general. In this mental model, a tuple-form expression doesn't represent a tuple +per se; rather, it abstractly represents a sequence of outcomes, which can be +used to initialize either a tuple or an array (or a user-defined type). +Similarly, a struct-form expression doesn't represent a struct per se, but +rather a more abstract sequence of _named_ outcomes. + +This raises the question of whether other kinds of expressions besides literals +can represent this sort of abstract sequence (that is can have composite form). +Our proposed approach is that as much as possible, the core language should +delegate these questions to whatever library defines the operation in question. +When that library is the Carbon prelude, it should implement a policy that only +tuple and struct literals can create compound forms "out of thin air", but +operations that operate element-wise on tuples and structs should prefer to +preserve whatever form structure is present in the input, and not add any new +form structure. + +For example: + +```carbon +let c: array(X, 3) = (1, 2, 3) as (X, X, X); + +impl Y as Core.As(X); +fn MakeYs() -> (Y, Y, Y); +let d: array(X, 3) = MakeYs() as (X, X, X); +``` + +The declaration of `c` is valid, because the `as` conversion operates +element-wise on its tuple-form input, so its output should likewise have a tuple +form. On the other hand, the declaration of `d` is not valid, because the input +to `as` has a primitive form, so its output should also have a primitive form +(and we somewhat-arbitrarily choose its category to be "initializing"). + +Note that the conversion from `Y` to `X` is an initializing expression, and the +conversion from `(Y, Y, Y)` to `(X, X, X)` just applies that conversion to each +`Y` value, so that conversion must first produce an outcome with form +`(var X, var X, var X)` before converting it to any other form. Consequently, +requiring that conversion to have a primitive form imposes overhead. For +example: + +```carbon +let (var p: X, var q: X, var r: X) = MakeYs() as (X, X, X); +``` + +The conversion to `(X, X, X)` internally produces an outcome with form +`(var X, var X, var X)`, then converts it to form `var (X, X, X)` in order to +ensure that it's primitive, and then converts back to form +`(var X, var X, var X)` because that's what the pattern expects. In this case +that unnecessary round-trip is probably cheap, but that may not always be the +case. Upcoming proposals are expected to enable the user to define the implicit +conversion from `Y` to `X` as a value expression. In that case, forcing +`MakeYs() as (X, X, X)` to have primitive form will add a large overhead to code +like this: + +```carbon +let (p: X, q: X, r: X) = MakeYs() as (X, X, X); +``` + +Here the conversion internally produces an outcome with form +`(val X, val X, val X)`, converts it to form `var (X, X, X)` by direct +initialization and materialization (which can be very costly, depending on `X`), +only to convert it back to `(val X, val X, val X)`. + +The following subsections discuss some alternative approaches to this set of +problems. + +#### Use the forms that operations naturally produce + +We could instead have a policy that type conversions defined in the prelude have +whatever form their implementation can most naturally and efficiently produce. +For example, a conversion from one tuple type to another would have a tuple form +whose sub-forms are determined by the forms of the element conversions, but a +conversion from a tuple type to itself would preserve the form of its input. +This would resolve the efficiency concerns with the current proposal, because +form conversions would only be performed once we know what form is expected at +the point of use. + +On its own, this would have some very surprising edge cases: + +```carbon +impl Y as Core.ImplicitAs(X); +fn MakeYs() -> (Y, Y, Y); + +fn F() -> (var X, var X, var X) { + return MakeYs(); +} +fn G() -> (var Y, var Y, var Y) { + return MakeYs(); +} +``` + +Here `F` is valid because the implicit conversion in the `return` statement has +a tuple form, but `G` is invalid because the implicit conversion (from +`(Y, Y, Y)` to itself) has a primitive form. In other words, `G` is invalid +because the type of the `return` operand isn't _different enough_ from the +return type of the function. To avoid that problem, we would need to adjust the +language rules to make `G` valid, by saying that form decomposition can happen +implicitly as part of category conversion (which is defined by the language, not +the library). + +This in turn implies that form structure cannot reliably carry higher-level +semantic information such as whether the expression is derived from a tuple +literal, so it wouldn't make sense to give special treatment to composite +categories in places like `array` initialization. This would be surprising to +some people, but would also resolve the surprise that others feel with the +current approach. + +#### Smarter choice of primitive category + +Currently, when converting to a primitive form in order to avoid adding +structure during a type conversion, we always convert to an initializing +expression, but this choice is somewhat arbitrary, and as discussed above, +converting to a value expression would be more efficient in some cases +(converting to a reference expression is never the most efficient choice, +because form composition can only form a reference expression by way of an +initializing expression). + +In principle we could address this performance problem by converting to a value +expression instead of an initializing expression in some cases, such as when all +of the primitive sub-forms are value expressions. However, when the primitive +sub-forms are a mixture of different categories, it's not at all clear how to +determine which primitive category would be optimal. Even if we could solve that +problem, at best this would only be a partial mitigation. For example: + +```carbon +fn F() -> (X, Y); +let (a: A, var b: B) = F() as (A, B); +``` + +If the conversion from `X` to `A` is a value expression, and the conversion from +`Y` to `B` is an initializing expression, there is no primitive expression +category we can assign to `F() as (A, B)` that can match the zero-overhead +performance we would get if its form were `(val A, var B)`. + +In short, this approach would complicate the language rules and make performance +less predictable, in exchange for a limited and unclear performance benefit. + +#### Only literals have composite form + +In principle, we could instead fully embrace the intuition that the form of an +expression reflects how the expression is written, so that only struct and tuple +literals have composite forms. However, by disallowing function calls from +having composite forms, this would defeat one of the main purposes of +introducing forms to begin with. + +### Materialize temporaries to preserve struct initialization order + +Consider the following code: + +```carbon +fn MakeX() -> X; +fn MakeY() -> Y; +var s: {.x: X, .y: Y} = {.y = MakeY(), .x = MakeX()}; +``` + +There are several plausible ways to execute this code: + +1. Evaluate `MakeY()` and then `MakeX()`, with each call taking the + corresponding field of `s` as its hidden output parameter. This means that + the struct fields are not initialized in declaration order, and consequently + they aren't initialized in reverse order of destruction. This can make it + easier to accidentally write destruction-order bugs where a field is + destroyed before another field that depends on it. +2. Evaluate the initializer in the order of the fields of `s` (`MakeX()` and + then `MakeY()`), with each call taking the corresponding field of `s` as its + hidden output parameter. This means the initializer is not evaluated in + lexical order, and more generally it means the evaluation order of an + expression depends on how the expression is used. +3. Evaluate the initializer in lexical order, but have the `MakeY()` and + `MakeX()` calls output to temporaries, and then move them into the fields of + `s` in those field's declaration order. This option has higher space overhead + than the other two because of the temporaries, and higher time overhead + because of the need to move out of them, and the code doesn't compile at all + if `X` or `Y` is not movable. +4. As a variant of (3), we can omit the temporary for `MakeX()` and have it + directly initialize `s.x`. More generally, this approach evaluates the + initializer in lexical order, but only introduces a temporary for `f` if + there's a field that's before `f` in the declaration order, but after `f` in + the initializer. This always results in fewer moves and temporaries than (3), + but still more than (1) and (2), and it's harder to predict which moves will + occur (and hence whether the code will compile). + +We have chosen option (1) because the risk of destruction-order bugs seems small +and manageable (especially in the planned memory-safe subset of Carbon), so that +risk is outweighed by the risk of confusion with option (2) and the risk of +silent inefficiency and/or surprising build failures in options (3) and (4). See +also Carbon's [goals](/docs/project/goals.md), where "performance-critical +software" and "code that is easy to read, understand, and write" are higher +priorities than "practical safety and testing mechanisms". + +### Support binding `ref self` to ephemeral references + +`ref` patterns can only match durable reference expressions, but prior to this +proposal, `ref self` patterns could match ephemeral references as a special-case +exception. This was intended to support certain C++ idioms that rely on +materializing a temporary and then mutating it in place, such as fluent +builders. For example: + +```carbon +class FooBuilder { + // These methods mutate `self` and then return a reference to it. + fn SetBar[ref self: Self]() -> ref Self; + fn SetBaz[ref self: Self]() -> ref Self; + + fn Build[ref self: Self]() -> Foo; +} +fn MakeFoo() -> FooBuilder; + +let foo: Foo = MakeFoo().SetBar().SetBaz().Build(); +``` + +Prior to this proposal, this code would be valid: `MakeFoo()` is an initializing +expression, but when it is matched with `ref self: Self` as part of the `SetBar` +call, it is implicitly converted to an ephemeral reference, and then the +special-case rule allows `ref self: Self` to bind to the materialized temporary. + +However, those rules also imply that code like this would be valid: + +```carbon +let builder: FooBuilder = MakeFoo(); +builder.SetBar(); +builder.SetBaz(); +let foo: Foo = builder.Build(); +``` + +Here the programmer has accidentally used `let` instead of `var`, so `builder` +is an immutable value. But a value expression can be implicitly converted to an +initializing expression by direct initialization, and as we already saw, it's +valid to call `MakeFoo()` and `MakeBar()` on an initializing expression. So this +code repeatedly materializes, mutates, and then discards a copy of `builder`, +and then ultimately initializes `foo` with the state returned by `MakeFoo()`, +which is surely not what the programmer intended. This sort of "lost mutation" +bug is exactly what the distinction between durable and ephemeral references was +intended to prevent, but the `ref self` special case combines with the +transitivity of category conversions to defeat that protection. + +A proper resolution of this issue seems beyond the scope of this proposal, so +this proposal removes that special case without replacement, leaving the problem +of supporting idioms like fluent builders as future work.