Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TextualEnum newtype wrapper #186

Merged
merged 1 commit into from
Oct 4, 2024
Merged

Add TextualEnum newtype wrapper #186

merged 1 commit into from
Oct 4, 2024

Conversation

mjgpy3
Copy link
Contributor

@mjgpy3 mjgpy3 commented Jul 29, 2024

This is an experiment that I'm putting out for review to see what the team thinks. I won't feel bad if we don't like it.

What is it?

TextualEnum is a newtype wrapper meant to provide a lot of boilerplate serialization/deserialization instances that we keep defining and redefining as we introduce new types and services.

How's it work?

The core idea is that if you can give your type an Enum instance, a Bounded instance, and a way to convert to text, then that should be enough to provide string-type (i.e. textual) representations of it.

So toText tells us how to serialize the value, and we can look through the Bounded/Enum instance values to parse it "from text".

toText is preferable from fromText because, for bounded/enums, totality is straightforward to check/implement and further enforced if cases are updated/removed/added.

EnumName

I've also included EnumName because we've started introducing codec/schema definitions that can be named.

Example

You can see the specs (PrimaryColor) for an example.

@mjgpy3 mjgpy3 requested review from a team and jason-lieb and removed request for a team July 29, 2024 20:45
@mjgpy3 mjgpy3 force-pushed the gilli/textual-enum branch from 4d1eb1b to 1592ef8 Compare July 29, 2024 20:48
Comment on lines +95 to +96
instance (Bounded a, Enum a, EnumValue a, Eq a) => HasCodec (TextualEnum a) where
codec = stringConstCodec $ (id &&& (toText . enumValue)) <$> enums @a
Copy link
Contributor

@chris-martin chris-martin Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you actually derive instance this this instance via EnumValue? As we just happened to be discussing a minute ago there seems to be some problem with deriving HasCodec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow what you're asking. Are you asking if this instance is broken or if there's a simpler way to derive it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referring to what we mentioned in backend guild today that HasCodec for whatever reason isn't a class that works with deriving via. Just wondering how much that limits the utility of this instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he's askjing if you can remove this explicit instance and use deriving via to achieve this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I follow now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying I know you cannot use deriving via to achieve this, so I don't understand what purpose the instance serves.

Comment on lines +33 to +35
class EnumValue a where
-- | Convert a 'TextualEnum' to 'Text'
toText :: a -> Text
Copy link
Contributor

@chris-martin chris-martin Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought -

data TextMapping a =
  TextMapping !(a -> Text) !(Text -> Maybe a)

class EnumValue a where
  textMapping :: TextMapping a

enumBoundedTextMapping :: (Enum a, Bounded a) => (a -> Text) -> TextMapping a
enumBoundedTextMapping f =
  let
    !m = Map.fromList $ (id &&& f) <$> [minBound..maxBound]
  in
    TextMapping f (flip (Map.lookup m))
data ContentArea = Math | Ela
  deriving (...) via TextualMapping ContentArea

instance EnumValue ContentArea where
  textMapping = enumBoundedTextMapping $ \case
    Math -> "math"
    Ela -> "ela"

This approach does two things -

  • Allows the user options other than enum/bounded to enumerate if they want
  • Allows a logarithmic rather than linear lookup, using a Map which is constructed only once (not 100% sure if that would require explicitly lifting each textMapping definition to the top level or not, would want a benchmark to test.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about doing something like this but I came to the conclusion that basically

not 100% sure if that would require explicitly lifting each textMapping definition to the top level or not

was kind of hazy (i.e. I'm unsure exactly how to get GHC to compute it once) and thus prone to being incorrectly done.

I'm also not convinced that a hash map is actually quicker for smaller enums. Hashing a value vs. looking through like 4-13 cases feels like a wash. I could be totally wrong here as I have no data to back this up but the tradeoffs don't feel worth it at first blush.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I greatly prefer using deriving via instead of wrapping a bunch of types with a utility type; I'm fine with wrapping in a utility type for calculation purposes, but it gets really stinky when someone does e.g.

data Person = { age :: Sum Int } 

totalAge :: [Person] -> Int
totalAge = getSum .  fold . age 

instead of, like:

data Person = { age :: Int } 

totalAge :: [Person] -> Int
totalAge = getSum . fold <$> Sum . age 

-- which would let you then do

youngestAge :: [Person] -> Int
youngestAge = getMin . fold <$> Min . age 

.. tho I admit ive seen it in the our codebase a few times, and that this is a personal preference that I can't justify beyond "confuses the data model" and "at what point do you stop adding utility wrappers"

Tho, lol, I guess this utility wrapper thing is kinda sorta analogous to monad transformer stacks, eh?

I don't really have an opinion on the linear vs log search. If the N is small it doesn't matter much. I guess if someone did the following, it would matter a lot:

newtype Age = Age Int
  deriving (...) via TextualMapping Age

instance EnumValue ContentArea where
  textMapping = enumBoundedTextMapping show

main = do putStrLn $ show $ Age maxBound

but idk this seems like quite the goofy thing to do, and hey, I notice that you address it. If you do it via mapping, and someone were to enumerate the set of Ints, you'd end up eating up so much memory... so idk.

you could make a way for the user to use a second function tho so a body could pick, e.g. enumBoundedTextMapping and enumBoundedTextMappingMemo or something

Copy link
Contributor

@chris-martin chris-martin Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

greatly prefer using deriving via instead of wrapping a bunch of types with a utility type

Agreed and I'm wondering if maybe the spec here would be a good place to demonstrate using TextualEnum to define instances directly for the PrimaryColor type.

Copy link
Contributor

@joelmccracken joelmccracken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but i'd like to resolve the "how this is to be used" (hopefully, not wrapping a bunch of domain types, but deriving via/newtype)

via TextualEnum PrimaryColor

instance EnumValue PrimaryColor where
toText = \case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about an example showing using https://hackage.haskell.org/package/base-4.20.0.1/docs/Data-Data.html#v:toConstr which would help DRY/standardize/make one less spot to forget to update when adding an enum value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 👎 in general on things that couple strings to Haskell identifiers, because it makes the code less robust against refactoring, leads to compromising what strings should be to deal with Haskell restrictions (e.g. wanting an enum string to be "type"), and makes it harder to search the code when you're expecting to be able to find a string literal appearing somewhere but it's actually derived from a Haskell identifier instead -- But sometimes I guess it's appropriate, and TIL about toConstr, so wouldn't argue against more docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 I get that pov. I do go back and forth. so much is automatically generated, and depending upon tradeoffs, it becomes unclear when specifically someone would be searching for a string vs somewhere else. But yea if you do autogen these, it really important that someone be able to decouple these when its needed, and sometimes stuff don't have "i need to roll my own" docs.

either way perhaps its worth documenting/commenting/illustrating using it? i'm good with either way.

Comment on lines +37 to +39
class EnumName a where
-- | Name of a 'TextualEnum', used for naming schemas
enumName :: Proxy a -> Text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making this derivable?

newtype DataName a = DataName a

instance Data a => EnumName (DataName a) where
  enumName _ = _ -- Some sort of reflection?

data MyThing = ...
  deriving EnumName via DataName MyThing

Copy link
Contributor

@chris-martin chris-martin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few ideas for additional features floating around but I think this will be useful.

@jason-lieb jason-lieb removed their request for review August 2, 2024 17:50
@mjgpy3
Copy link
Contributor Author

mjgpy3 commented Sep 26, 2024

@joelmccracken re-requesting a review on this. I think I've covered your concerns but it's been a while so just wanted to make sure (sorry, I went on vacation shortly after opening this so I forgot about it).

Copy link
Contributor

@joelmccracken joelmccracken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, honestly I barely remember any of this lol

@mjgpy3 mjgpy3 force-pushed the gilli/textual-enum branch from 53304e5 to d11ce52 Compare October 2, 2024 20:52
This is an experiment that I'm putting out for review to see what the team
thinks. I won't feel bad if we don't like it.

**What is it?**

`TextualEnum` is a `newtype` wrapper meant to provide a lot of boilerplate
serialization/deserialization instances that we keep defining and redefining as
we introduce new types and services.

**How's it work?**

The core idea is that if you can give your type an `Enum` instance, a `Bounded`
instance, and a way to convert to text, then that should be enough to provide
string-type (i.e. textual) representations of it.

So `toText` tells us how to serialize the value, and we can look through the
`Bounded`/`Enum` instance values to parse it "from text".

`toText` is preferable from `fromText` because, for bounded/enums, totality is
straightforward.

**EnumName**

I've also included `EnumName` because we've started introducing codec/schema
definitions that can be named.

**Example**

You can see the specs (`PrimaryColor`) for an example.

**Benefits**

This should reduce a great deal of boilerplate, and, if used, will reduce a lot
of ad-hoc-ry that exists across our in-the-wild definitions.
@mjgpy3 mjgpy3 force-pushed the gilli/textual-enum branch from d11ce52 to 1ac1ae7 Compare October 2, 2024 21:00
@mjgpy3 mjgpy3 merged commit 5dc71f4 into main Oct 4, 2024
6 checks passed
@mjgpy3 mjgpy3 deleted the gilli/textual-enum branch October 4, 2024 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants