Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A more ergonomic enumeration representation #276

Open
sfackler opened this issue Feb 3, 2020 · 25 comments
Open

A more ergonomic enumeration representation #276

sfackler opened this issue Feb 3, 2020 · 25 comments

Comments

@sfackler
Copy link
Contributor

sfackler commented Feb 3, 2020

To deal with the "openness" of Protobuf enumeration types, Prost currently represents them in the message struct as i32 and generates a Rust enum that can convert to and from i32. This is a bit awkward to work with, though.

A better approach could be to use a newtype with associated constants. To use the Syntax enum as an example, it would look like this:

#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, ...)]
pub struct Syntax(i32);

impl Syntax {
    pub const PROTO2: Syntax = Syntax(0);
    pub const PROTO3: Syntax = Syntax(1);

    // alternatively the field could just be public
    pub fn from_raw(raw: i32) -> Syntax {
        Syntax(raw)
    }

    pub fn as_raw(self) -> i32 {
        self.0
    }
}

This approach has a couple of advantages:

  • It can be embedded directly in the message struct.
  • It's more type safe since you normally won't have to interact with the raw i32 value.
  • You can still use it roughly like a Rust enum, and do things like match on the variants:
match syntax {
    Syntax::PROTO2 => { ... }
    Syntax::PROTO3 => { ... }
    syntax => panic!("unknown syntax variant {}", syntax.as_raw()),
}
@danburkert
Copy link
Collaborator

@sfackler are you aware that type-safe getters are generated for enum fields? For instance: https://docs.rs/prost-types/0.6.1/prost_types/struct.Type.html#method.syntax

@danburkert
Copy link
Collaborator

I do like this idea, though. It definitely has some merit. My experience has been that working with enums through the typesafe getter/setter is 'good enough', but they have a discoverability problem. I don't think they are actually documented anywhere other than the resulting rustdoc.

@sfackler
Copy link
Contributor Author

sfackler commented Feb 3, 2020

Ooh, I did not know about those!

@sfackler
Copy link
Contributor Author

sfackler commented Feb 3, 2020

I'd be happy to build out the implementation, but I'd be interested on your thoughts on how to structure it. 0.6 is pretty new, so this could just be a breaking change and prost bumps to 0.7, or it could end up as a new flag in prost_build::Config to allow users to opt into the new approach.

@jakeswenson
Copy link

@sfackler @danburkert How's this going?
I have been using prost more and more, and I do find the enums slightly painful; I think this proposal goes a long way to easing that.

@sfackler
Copy link
Contributor Author

I'm going to work on this over the weekend.

@sfackler
Copy link
Contributor Author

sfackler commented Apr 11, 2020

For an enum:

enum SomeEnum {
    SOME_ENUM_FOO = 0;
    SOME_ENUM_BAR = 2;
}

prost-build will produce:

#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub struct SomeEnum(i32);

#[::prost::enumeration]
impl SomeEnum {
    pub const FOO: SomeEnum = SomeEnum(0);
    pub const BAR: SomeEnum = SomeEnum(2);
}

and prost-derive will expand that to:

#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub struct SomeEnum(i32);

impl SomeEnum {
    pub const FOO: SomeEnum = SomeEnum(0);
    pub const BAR: SomeEnum = SomeEnum(2);

    pub fn is_valid(self) -> bool {
        match self {
            SomeEnum::FOO | SomeEnum::BAR => true,
            _ => false,
    }
}

impl Default for SomeEnum {
    fn default() -> SomeEnum {
        SomeEnum::FOO
    }
}

impl From<SomeEnum> for i32 {
    fn from(value: SomeEnum) -> i32 {
        value.0
    }
}

impl From<i32> for SomeEnum {
    fn from(value: i32) -> SomeEnum {
        SomeEnum(value)
    }
}

@nw0
Copy link

nw0 commented Apr 23, 2020

Hi @sfackler, I've run into the same issue but found this before writing my own code. Is this currently WIP? I'd be happy to help out if so.

@sfackler
Copy link
Contributor Author

I spent a while working on it, but then got sidetracked with other things I've pushed what I have onto a branch but I'm not sure when I'll be able to circle back to it: sfackler@0330d92

@danburkert
Copy link
Collaborator

Can anyone share some application code that this alternate representation cleans up? I'd love to be convinced since this seems to come up over and over, but I'm still not seeing how this is materially better than the generated getters + generated (Rust) enums.

@mieubrisse
Copy link

mieubrisse commented Feb 10, 2021

Are the generated getters still working? I have this Prost-generated code:

#[derive(Clone, PartialEq, ::prost::Message)]
pub struct SuiteRegistrationResponse {
    #[prost(enumeration = "SuiteAction", tag = "1")]
    pub suite_action: i32,
}
/// Tells the suite what action it should perform, based on the args that the API container received
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, ::prost::Enumeration)]
#[repr(i32)]
pub enum SuiteAction {
    /// Indicates that the testsuite should operate in metadata-serializing mode, sending suite metadata to the
    ///  API container
    SerializeSuiteMetadata = 0,
    /// Indicates that the testsuite should operate in test-executing mode, running a test
    ExecuteTest = 1,
}

which is throwing the following where when I try to use from:

error[E0308]: mismatched types
  --> lib/src/execution/test_suite_executor.rs:78:34
   |
78 |         let action = SuiteAction::from(action_int);
   |                                        ^^^^^^^^^^ expected enum `SuiteAction`, found `i32`

Version: prost 0.7, generated via tonic-build 0.4.0

@danburkert
Copy link
Collaborator

Yes, they are still present in prost 0.7. prost enums have never provided a From<i32> impl, I think you may be looking for the from_i32 method. Or you can use the SuiteRegistrationResponse::suite_action() getter.

@mieubrisse
Copy link

Gotcha, from_i32 was the missing bit (weirdly VS Code wasn't showing that as an option). Using that method works now!

@danburkert
Copy link
Collaborator

Yeah these methods are generated via derives, so RA doesn't seem to 'see' through it a lot of the time. Best way to work with prost in my experience is to generate the docs and look there; rust-doc will show the derive generated code.

@avsaase
Copy link

avsaase commented Jun 9, 2023

Maybe I'm missing something here but why not just generate a Rust enum with the #[non_exhaustive] attribute?

@benma
Copy link

benma commented Jul 5, 2023

The discoverability problem is pretty big imo. I want to expose a Prost generated struct as a function param in a library function. The rust docs just says i32 for the enum type. It would be much preferable if it actually showed the enum type and all variants.

@danburkert could we have a decision to move forward? There are multiple proposals better than the status quo, e.g Unknown(i32) or the proposal at the top of this issue.

@mzabaluev
Copy link
Contributor

mzabaluev commented Mar 24, 2024

@avsaase

Maybe I'm missing something here but why not just generate a Rust enum with the #[non_exhaustive] attribute?

In proto3, the enums are open and the language guide suggests that unknown values are preserved in the message. I could not find anything more normative, but Prost currently conforms to this by erasing the enum type in message fields. While I mostly find myself needing more idiomatic domain types to work with protobuf-originated data anyway, this gives more room for programmer error than necessary.

I lean towards this proposal This proposal has advantages over an added Unknown(i32) variant, since that one:

  1. injects a magical name which might come into conflict with an actually defined *UNKNOWN variant;
  2. loses #[repr(i32)] and makes the size of all generated enum types to be two 32-bit words rather than one.

A somewhat more cumbersome solution that preserves the enums could be a generic wrapper like this:

pub enum EnumFieldValue<T> {
    Known(T),
    Unknown(i32),
}

impl<T> EnumFieldValue<T> {
    pub fn unwrap(self) -> T {
        unimplemented!("for brevity")
    }

    pub fn known_or_else<E, F>(self, err: F) -> Result<T, E>
    where
        F: FnOnce(i32) -> E,
    {
        unimplemented!("for brevity")
    }
}

This has the same effect on the field size as Unknown(i32).

Regardless of the above, you'd want your proto3-derived enums to be annotated with #[non_exhaustive], as you'd want your structs (with some builder code to get over the initialization issues) since this is how bindings are expected to be evolved with protobuf's semver conventions.

@mzabaluev
Copy link
Contributor

This has the same effect on the field size as Unknown(i32).

PhantomData to the rescue:

pub struct EnumFieldValue<T> {
    value: i32,
    _phantom: core::marker::PhantomData<T>,
}

impl<T> EnumFieldValue<T>
where
    i32: TryInto<T>,
{
    pub fn unwrap(self) -> T {
        let v = self.value;
        v.try_into()
            .map_err(|_| format!("unknown field value {}", v))
            .unwrap()
    }

    pub fn known_or_else<E, F>(self, err: F) -> Result<T, E>
    where
        F: FnOnce(i32) -> E,
    {
        let v = self.value;
        v.try_into().map_err(|_| err(v))
    }
}

@ryankurte
Copy link

ryankurte commented May 12, 2024

every couple of years i pick prost back up and run into this again 😅

The discoverability problem is pretty big imo. I want to expose a Prost generated struct as a function param in a library function. The rust docs just says i32 for the enum type. It would be much preferable if it actually showed the enum type and all variants.

real enum types are definitely better for documentation / external use / compatibility with things like serde!

given fields are wrapped in Options already would it be unreasonable for prost_build to provide a lossy mode for enums where empty and unknown variants get promoted to None? that way folks who don't need access to unknown enum variants could opt-in to having well formed rust types.

@caspermeijn
Copy link
Collaborator

real enum types are definitely better for documentation / external use / compatibility with things like serde!

I agree, we should work towards using the actual enum for enum fields.

given fields are wrapped in Options already would it be unreasonable for prost_build to provide a lossy mode for enums where empty and unknown variants get promoted to None? that way folks who don't need access to unknown enum variants could opt-in to having well formed rust types.

I would argue that separated lossy mode is also not discoverable. I think prost-build should generate better default code.

I think a good middle ground for all described use cases is:

pub enum EnumOption<T> {
    Known(T),
    Unknown(i32),
    Missing,
}

impl<T> EnumFieldValue<T> {
    pub fn unwrap(self) -> T {
        todo!()
    }

    pub fn known(self) -> Option<T> {
        todo!()
    }
}

And then let prost-build generate the following code:

#[repr(i32)]
pub enum Label {
        Optional = 1,
        Required = 2,
        Repeated = 3,
}

#[derive(::prost::Message)]
pub struct FieldDescriptorProto {
    #[prost(int32, optional, tag = "3")]
    pub label: ::prost::EnumOption<Label>,
}

@ryankurte Are you willing to do the work? I suggest opening a proof of concept pull request with the enum EnumOption and a work in progress code generator. Then we can review that work before you spend all the work refactoring the test suite. Feel free to choose better names.

@mzabaluev
Copy link
Contributor

mzabaluev commented May 12, 2024

I have just submitted #1061 which I have been holding off for some time. I will un-draft it after adding documentation. The memory layout of the OpenEnum wrapper is exactly that of i32, so this does not add overhead unlike the enum-based solutions.

One thing I dislike about the EnumOption proposal above is that it eschews the idiomatic Option type, which is used for other non-scalar fields. Memory layout optimization is a wrong reason to do it, because composing Option with an inner Known | Unknown enum will use niche value optimization, so you only need to pay the overhead of a single enum discriminant if you decide to encode these three cases with Rust-native enums.

Furthermore, I'm not convinced that making all enum fields unconditionally optional is the best ergonomic approach in all cases, for the same reason why scalars are not guarded with an Option by default. There is a protobuf style guideline backed by a buf lint, where the zero-valued variant is explicitly designated as "unspecified" and used as a fallback for the missing field in e.g. generated Go code. For specifications that use this style, a generated representation providing the missing field case will add unnecessary complication.

@ryankurte
Copy link

ryankurte commented May 12, 2024

I would argue that separated lossy mode is also not discoverable. I think prost-build should generate better default code.

definitely in favour of more useful / idiomatic generated code! it'd be nice to be able to use prost-generated objects like they're normal(ish) rust objects. something like EnumOption seems reasonable, the only caveat being that folks may need to be able to control the way this serialises (though i think it would be reasonable to set it to be transparent as a baseline and worry about customising this later).

@mzabaluev the OpenEnum trick is clever but doesn't improve the representation or use of enums with match / in other objects. also this makes doing things like ToString/FromString/VariantNames useful for serialisation and command-line stuff even weirder than it is now (though arguably this could be paved over with generated code).

Furthermore, I'm not convinced that making all enum fields unconditionally optional is the best ergonomic approach in all cases, ...

i think your earlier proposal would make sense with separate handling of known/unknown and optional fields? so Option is only used where required to express field presence for encode and decode, otherwise you just get the EnumFieldValue wrapper with defaults / elision.

pub enum EnumFieldValue<T: Default> {
  Known(T),
  Unknown(i32),
}

struct Something {
  // Normal fields are always optional and default to 0'th variant
  pub normal_field: EnumFieldValue<SomeEnum>,
  // Field labeled 'optional' should be able to represent zero _or_ unset condition
  pub optional_field: Option<EnumFieldValue<AnotherEnum>>,
}

@mzabaluev
Copy link
Contributor

the OpenEnum trick is clever but doesn't improve the representation or use of enums with match / in other objects.

I think you can match on message.field.known() in most cases, albeit less conveniently (matching on specific unknown values should be considered pathological). The OpenEnum wrapper is not meant to be used in objects other than the generated message structs, you should use the enum type itself for that. It can be cumbersome when the whole message struct value is matched as part of a composite type, Result being perhaps the most usual case of this.

also this makes doing things like ToString/FromString/VariantNames useful for serialisation and command-line stuff even weirder than it is now (though arguably this could be paved over with generated code).

Regardless of the wrapper's design, such implementations/derives would need to be transparent for the known values. I don't think you can do Display/ToString infallibly on open enum values, as there is no universally accepted rendering for the unknown cases.

@mzabaluev
Copy link
Contributor

i think your earlier proposal would make sense with separate handling of known/unknown and optional fields? so Option is only used where required to express field presence for encode and decode, otherwise you just get the EnumFieldValue wrapper with defaults / elision.

If nobody's working on it, I can try to remake my OpenEnum wrapper as the proper enum over the weekend.
This effectively computes the known/unknown discriminant once on construction and enables destructuring of enum fields as part of a message. The memory overhead should be tolerable outside, perhaps, of very memory-constrained embedded uses.

@ArjixWasTaken
Copy link

ArjixWasTaken commented Jun 28, 2024

Since proto is pretty open on enum values, meaning they must be preserved even when invalid,
why not wrap them in a Result?

I think Result<T, i32> where T is the generated enum would be more than enough

Forwarded to #1079

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests