-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add digestable witnesses of binary codecs #34
base: main
Are you sure you want to change the base?
Conversation
(fun c acc -> | ||
let* acc = acc in | ||
match c with | ||
| C0 _ -> return (Empty :: acc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not mistaken, this means that the order of the elements in a Repr.enum
is not reflected in a shape.
In general, a shape check cannot detect certain changes in the order of the cases and fields. This seems to be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with trying to distinguish those cases is that they're not actually distinguished in the underlying binary codec, so this becomes a bit of a slippery slope of trying to forbid certain "unwanted" but valid transformations on types. For instance, re-ordering a pair of fields of the same type is equivalent to renaming each one individually, so forbidding the former requires forbidding the latter (in order to keep transitivity on the shape equivalence relation). I can't think of a good way around that problem.
The current approach tries to stay away from such questions by only capturing information contained in the types themselves. If two fields of the same shape can't be permuted, the answer is probably to give them different shapes (by versioning their semantic interpretation with Custom
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting the equivalence of shapes to imply the equivalence of the whole encode_bin/decode_bin process. I understand the subtle difference with the equivalent of codecs. I don't see a solution to this problem either.
Fortunately, such a change in the order of cases/fields should be fairly easy to detect during a code review.
decoders defined here. Shapes are represented canonically such that | ||
equality of shapes of two types [t1] and [t2] implies equivalence of the | ||
binary codecs derived from [t1] and [t2]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the Shape
module is disjoint from the Encode
and Decode
modules, there is no guarantee of this equivalence. There may be accidental discrepancies between the shape and the codecs.
One solution to avoid this problem would be to derive codecs from shapes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'd like to do that, but it has a problem: deriving codecs from shapes requires Shape.t
to be a GADT w/ witnesses for the algebraic cases – and we don't support type representations for GADTs. This would force us to write out comparisons / pretty-printing / serialisers for Shape.t
by hand. Those functions would also have to unfold recursive loops (e.g. with the Unrolling.t
monad), so this starts to look quite a lot like defining a third intermediate typed AST that sits between Type.t
and (the current) Shape.t
– I'm not sure it's worth the effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe if shape was the core of Type.t
it would be less of a problem to implement, but it might not be worth the effort indeed.
(** Deriving shapes from type reps is straightforward, but requires some care | ||
with recursive types. We convert recursive loops to use De Bruijn indexing | ||
by unfolding each one with a fresh placeholder and tracking recursion | ||
depth so that we can replace the placeholders with the right index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand all the magic yet, but the code is interesting to read!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try and make it clearer on a subsequent pass 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all your clarifications!
Fix #32.
Not to be merged until we have a sense of how this is used in Irmin – which won't be until after the 2.3.0 release.