From 6bc8f2b4ae758bf8ef534a47f9c2271be047d026 Mon Sep 17 00:00:00 2001 From: Clay McLeod Date: Wed, 15 Jan 2025 23:39:46 -0600 Subject: [PATCH] feat: adds enumerations This commit adds enumerations to WDL. Enumerations in WDL are valued, meaning that they have an assigned valued for each variant therein. These values can either be explicitly or implicitly typed. Ultimately, enumerations are aimed at improved UX regarding a limited set of valid values within a particular context. Relevant issues: #139, #658 Co-authored-by: jdidion --- SPEC.md | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 172 insertions(+), 1 deletion(-) diff --git a/SPEC.md b/SPEC.md index ff90b7e5..1bdf347a 100644 --- a/SPEC.md +++ b/SPEC.md @@ -39,6 +39,7 @@ Revisions to this specification are made periodically in order to correct errors - [Map\[P, Y\]](#mapp-y) - [🗑 Object](#-object) - [Custom Types (Structs)](#custom-types-structs) + - [Enumeration Types (Enums)](#enumeration-types-enums) - [Hidden and Scoped Types](#hidden-and-scoped-types) - [`Union` (Hidden Type)](#union-hidden-type) - [`hints`, `input`, and `output` (Scoped Types)](#hints-input-and-output-scoped-types) @@ -73,9 +74,12 @@ Revisions to this specification are made periodically in order to correct errors - [WDL Documents](#wdl-documents) - [Versioning](#versioning) - [Struct Definition](#struct-definition) + - [Enum Definition](#enum-definition) + - [Enum Usage](#enum-usage) - [Import Statements](#import-statements) - [Import URIs](#import-uris) - [Importing and Aliasing Structs](#importing-and-aliasing-structs) + - [Importing and Aliasing Enums](#importing-and-aliasing-enums) - [Task Definition](#task-definition) - [Task Inputs](#task-inputs) - [Task Input Localization](#task-input-localization) @@ -1357,7 +1361,7 @@ Due to the lack of explicitness in the typing of `Object` being at odds with the ##### Custom Types (Structs) -WDL provides the ability to define custom compound types called [structs](#struct-definition). `Struct` types are defined directly in the WDL document and are usable like any other type. A struct is defined using the `struct` keyword, followed by a unique name, followed by member declarations within braces. A struct definition contains any number of declarations of any types, including other `Struct`s. +WDL provides the ability to define custom compound types called [structs](#struct-definition). `Struct` types are defined at the top-level of the WDL document and are usable like any other type. A struct is defined using the `struct` keyword, followed by a unique name, followed by member declarations within braces. A struct definition contains any number of declarations of any types, including other `Struct`s. A declaration with a custom type can be initialized with a struct literal, which begins with the `Struct` type name followed by a comma-separated list of name-value pairs in braces (`{}`), where name-value pairs are delimited by `:`. The member names in a struct literal are not quoted. A struct literal must provide values for all of the struct's non-optional members, and may provide values for any of the optional members. The members of a struct literal are validated against the struct's definition at the time of creation. Members do not need to be in any specific order. Once a struct literal is created, it is immutable like any other WDL value. @@ -1485,6 +1489,47 @@ Example output: Note that the ability to assign values to `Struct` declarations other than struct literals is deprecated and will be removed in WDL 2.0. +##### Enumeration Types (Enums) + +An enumeration (or "enum") is a closed set of enumerated values (known as "variants") that are considered semantically valid in a specific context. An enum is defined at the top-level of the WDL document and can be used as a declaration type anywhere in the document. + +An enum is defined using the `enum` keyword, followed by a globally unique name, followed by a comma-delimited list of identifiers—optionally tagged with values—in braces. When referring to a variant within an enum, for example, when assigning to an enum declaration, the `.` syntax should be used. + +```wdl +enum FileKind { + FASTQ, + BAM +} + +task process_file { + input { + File infile + FileKind kind = FileKind.FASTQ + } + + command <<< + echo "Processing ~{kind} file" + ... + >>> +} +workflow process_files { + input { + Array[File] files + FileKind kind + } + + scatter (file in files) { + call process_file { + input: + infile = file, + kind = kind + } + } +} +``` + +As an example, consider a workflow that processes different types of NGS files and has a `file_kind` input parameter that is expected to be either "FASTQ" or "BAM". Using `String` as the type of `file_kind` is not ideal - if the user specifies an invalid value, the error will not be caught until runtime, perhaps after the workflow has already run for several hours. Alternatively, using an `enum` type for `file_kind` restricts the allowed values such that the execution engine can validate the input prior to executing the workflow. + #### Hidden and Scoped Types A hidden type is one that may only be instantiated by the execution engine, and cannot be used in a declaration within a WDL file. @@ -3352,6 +3397,115 @@ struct Invalid { } ``` +## Enum Definition + +An `Enum` is an enumerated type. Enums enable the creation of types that represent closed sets of enumerated values (called "variants") that are semantically valid in a specific context. Once defined, an `Enum` type can be used as the type of a declaration like any other type. However, new variants of an `Enum` cannot be created. Instead, a declaration having an `Enum` type must be assigned one of the variants created as part of the `Enum`'s definition. + +An enum definition is a top-level WDL element, meaning it is defined at the same level as tasks, workflows, and structs, and it cannot be defined within a task or workflow body. An enum is defined using the `enum` keyword, followed by a name that is unique within the WDL document, and a body containing a comma-delimited list of variants (and optional assigned values) in braces (`{}`). + +```wdl +enum Color { + Red, + Orange, + Yellow, + Green, + Blue, + Purple +} +``` + +Enums are valued, meaning that each variant within an enum has an associated value. To indicate the type of the assigned value for each variant, enums can either be _explicitly_ or _implicitly_ typed. + +* Explicitly typed enums take an explicit type assignment within square brackets after the enum's identifier that declares the type of the value. Where possible, explicitly typed enums should attempt to resolve mismatched types via coercion. +* Implicitly typed enums are enums where each assigned value can be unambiguously mapped to a single [primitive type](#primitive-types) and, thus, does not need the type to be specified in brackets. Enums that are implicitly typed and for which no values are assigned are assumed to be `String`-valued where each value is a `String` that matches the variant name. Importantly, all assigned values within an implicitly typed enum must have the same type or an error is thrown. + +```wdl +# An explicitly typed enum that is `Float`-valued. +enum FruitCosts[Float] { + Banana = 1.33, + Orange = 2.20, + Apple = 3.47 +} + +# An explicitly typed enum that is `Float`-valued. Because the enum is +# explicitly typed, the `ThreePointOh` variant can be coerced to a `Float`, +# which is a valid enumeration definition. +enum FavoriteFloat[Float] { + ThreePointOh = 3, + FourPointOh = 4.0 +} + +# ERROR: because the enum is implicitly typed, the type cannot be unambiguously +# resolved, which results in an error. +enum FavoriteNumber { + ThreePointOh = 3, + FourPointOh = 4.0 +} + +# An implicitly typed enum that is `String`-valued. +enum Whitespace { + Tab = "\t", + Space = " " +} + +# An implicitly typed enum that is implied to be `String`-valued with the +# values "FASTQ" and "BAM" respectively. +enum FileKind { + FASTQ, + BAM +} +``` + +### Enum Usage + +An `Enum`'s variants are [accessed](#member-access) using a `.` to separate the variant name from the `Enum`'s identifier. + +A declaration with an `Enum` type can only be initialized by referencing a variant directly or by assigning it to the value of another declaration of the same `Enum` type. + +If two enum values are of the same type, they can be tested for equality (i.e., using `==` or `!=`) by testing the equality of the inner values assigned to each variant. Similarly, if two enum values are of the same type, they can be compared (i.e., using `>`, `>=`, `<`, `<=`) by performing the analogous operations on their assigned values. If two enums are _not_ of the same type, they cannot be compared and should throw a type mismatch error. To that end, an `Enum` cannot be coerced to or from any other type. + +```wdl +version 1.2 + +enum Pet { + Cat, + Dog, + Rat +} + +enum HotFood { + Dog, + Potato, + Tamale +} + +task pet_pet { + input { + Pet? pet + } + + Pet my_pet = select_first([pet, Pet.Dog]) + + Array[String] all_pet_names = [ + Pet.Cat, + Pet.Dog, + Pet.Rat + ] + + command <<< + echo "There are ~{length(all_pet_names)} kinds of pet: ~{sep(", ", all_pet_names)}" + echo "I have a pet ~{my_pet}" + >>> +} + +task error { + output { + # ERROR: these two types cannot be compared. + Boolean comparison = Pet.Dog == HotFood.Dog + } +} +``` + ## Import Statements Although a WDL workflow and the task(s) it calls may be defined completely within a single WDL document, splitting it into multiple documents can be beneficial in terms of modularity and code resuse. Furthermore, complex workflows that consist of multiple subworkflows must be defined in multiple documents because each document is only allowed to contain at most one workflow. @@ -3542,6 +3696,23 @@ struct Patient { } ``` +### Importing and Aliasing Enums + +`Enum`s are [imported in the same way as `Struct`s](#struct-namespacing) and have the same namespacing rules, namely that `Enum`s exist in the document's global scope, and importing an `Enum` copies its definition into the global scope of the importing document (potentially using an alias). + +```wdl +version 1.2 + +import "color.wdl" alias Color as Hue + +workflow another_wf { + input { + Hue hue = Hue.Blue + } + ... +} +``` + ## Task Definition A WDL task can be thought of as a template for running a set of commands - specifically, a Bash script - in a manner that is (ideally) independent of the execution engine and the runtime environment.