Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repr: use attributes for local function overrides #82

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

craigfe
Copy link
Member

@craigfe craigfe commented Oct 11, 2021

Currently, the functions Repr.{like,map} behave somewhat oddly when passed custom behaviour for one or more of the generic options: it evaluates all derivers on the typerep and packs the resulting functions into a Custom typerep containing those implementations.

This hides the structure of the type, but otherwise behaves sensibly as long as the derivers behave "linearly", e.g.:

derive (pair t1 t2) = derive (pair (partial ~derive:(derive t1) ()) (partial ~derive:(derive t2) ()))

but this is not always the case. An example is pre_hash which attempts to unbox the pre-hash representation of "simple" types and so behaves differently when complex types have simple components "pre-evaluated" as is currently the case.

This commit changes the implementation of Repr.{like,map} to instead add custom implementations as type attributes on the original type. These attributes don't hide the internal structure of the type in a Custom typerep, making function overrides transparent to derivers that don't care about those overrides.

CC @Ngoguey42. This doesn't quite solve our problem, since the user's custom pre_hash implementation may be implemented using pre_hash (and so sensitive to the "primitive" heuristic), but I think it gets closer.

@craigfe craigfe force-pushed the attributes-everywhere branch from a3ad58f to a21d498 Compare October 11, 2021 14:59
Currently, the functions `Repr.{like,map}` behave somewhat oddly when
passed custom behaviour for one or more of the generic options: it
evaluates all derivers on the typerep and packs the resulting functions
into a `Custom` typerep containing those implementations.

This hides the structure of the type, but otherwise behaves sensibly as
long as the derivers behave "linearly", e.g.:

```ocaml
derive (pair t1 t2) = derive (pair (partial ~derive:(derive t1) ()) (partial ~derive:(derive t2) ()))
```

but this is not always the case. An example is `pre_hash` which attempts
to unbox the pre-hash representation of "simple" types and so behaves
differently when complex types have simple components "pre-evaluated" as
is currently the case.

This commit changes the implementation of `Repr.{like,map}` to instead
add custom implementations as _type attributes_ on the original type.
These attributes don't hide the internal structure of the type in a
`Custom` typerep, making function overrides transparent to derivers that
don't care about those overrides.
@craigfe craigfe force-pushed the attributes-everywhere branch from a21d498 to 990f313 Compare October 11, 2021 19:24
Copy link
Contributor

@Ngoguey42 Ngoguey42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand everything that is happening here. I too think that this is a step in the right direction 👍 .

| None -> (
let rec is_prim : type a. a t -> bool = function
| Self s -> is_prim s.self_fix
| Attributes { attr_type; _ } -> is_prim attr_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the attribute is a Type_pp.Attr_pp, a Type_pp.Attr_of_string or a new fancy attribute that is not yet part of repr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but it's a good point that this is a change of behaviour. It seems best to me to keep this is_prim logic as small as possible (and, if we're going to have it at all, put it in the derivers themselves). What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this new connection between json and pp/of_string looks a lot like the connection between encode_bin and pre_hash. Both connections have a complicated semantic that may bite the repr user at unexpected moments.

One solution to fixing the problem of custom encode_bin shadowing custom pre_hash would be to remove that match bin with Some (x, _, _) -> Custom x | None -> Structural) case that can be found below.

Are you sure that introducing this new special case for json primitives is worth the trouble?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new logic is mostly a copy of the old relationship between json and pp under like, trying to keep the behaviour as similar as possible. I agree that it's a complex semantic, but if we drop it then we should be careful to check for existing call-sites that rely on it.

match pre_hash with
| Some x -> Custom x
| None -> (
match bin with Some (x, _, _) -> Custom x | None -> Structural)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to the previous pre_hash in partially_abstract ? (here https://github.com/mirage/repr/pull/82/files#diff-c7a2d1479dffeba7f7a7f341542999388b5c650e2392244de254809b44072e3aL336 ); why do we revert to encode_bin when there is not a pre_hash explicitly passed (instead of using the pre_hash of t as we do for instance for pp)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I put this here to preserve the old behaviour.

I'd guess that this was added because (a) pre_hash is similar to encode_bin for most types and (b) a valid encode_bin has the properties we'd expect of pre_hash, since it must decode values property. e.g. if I have some complex type with an efficient binary encoding, I can use this for free to get an efficient pre_hash.

I'd like to make this connection between encode_bin / decode_bin / size_of / pre_hash more explicit in the future, perhaps as part of something like #34.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how it makes sense for most types, but in the case of structured keys, the pre_hash and the encode_bin are different for commits, no? (I'm just trying to understand that issue, we don't need to fix it here :) )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes; good point. Custom pre_hash implementations within the children will be lost this way.

Perhaps we want to do this only if there are no pre_hash attributes in the type? (... and the type contains no Custom _ nodes?) These cases are sort-of like having pre_hash = Some _, just not specified inline as an argument.

@craigfe craigfe force-pushed the attributes-everywhere branch from 48b12dd to 990f313 Compare October 21, 2021 12:28
let rec t : type a. a t -> a pre_hash = function
| Self s -> self s
| Custom c -> stage c.pre_hash
| Map m -> map m
| Boxed b -> t b
| Attributes { attr_type; _ } -> t attr_type
| Attributes { attrs; attr_type } -> (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the attribute check missing in the second definition of t at the end of the module?

| None -> (
let rec is_prim : type a. a t -> bool = function
| Self s -> is_prim s.self_fix
| Attributes { attr_type; _ } -> is_prim attr_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this new connection between json and pp/of_string looks a lot like the connection between encode_bin and pre_hash. Both connections have a complicated semantic that may bite the repr user at unexpected moments.

One solution to fixing the problem of custom encode_bin shadowing custom pre_hash would be to remove that match bin with Some (x, _, _) -> Custom x | None -> Structural) case that can be found below.

Are you sure that introducing this new special case for json primitives is worth the trouble?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants