Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add digestable witnesses of binary codecs #34

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

craigfe
Copy link
Member

@craigfe craigfe commented Jan 6, 2021

Fix #32.

Not to be merged until we have a sense of how this is used in Irmin – which won't be until after the 2.3.0 release.

src/repr/type_binary.ml Show resolved Hide resolved
(fun c acc ->
let* acc = acc in
match c with
| C0 _ -> return (Empty :: acc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, this means that the order of the elements in a Repr.enum is not reflected in a shape.

In general, a shape check cannot detect certain changes in the order of the cases and fields. This seems to be a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with trying to distinguish those cases is that they're not actually distinguished in the underlying binary codec, so this becomes a bit of a slippery slope of trying to forbid certain "unwanted" but valid transformations on types. For instance, re-ordering a pair of fields of the same type is equivalent to renaming each one individually, so forbidding the former requires forbidding the latter (in order to keep transitivity on the shape equivalence relation). I can't think of a good way around that problem.

The current approach tries to stay away from such questions by only capturing information contained in the types themselves. If two fields of the same shape can't be permuted, the answer is probably to give them different shapes (by versioning their semantic interpretation with Custom).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting the equivalence of shapes to imply the equivalence of the whole encode_bin/decode_bin process. I understand the subtle difference with the equivalent of codecs. I don't see a solution to this problem either.

Fortunately, such a change in the order of cases/fields should be fairly easy to detect during a code review.

Comment on lines +30 to +32
decoders defined here. Shapes are represented canonically such that
equality of shapes of two types [t1] and [t2] implies equivalence of the
binary codecs derived from [t1] and [t2].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the Shape module is disjoint from the Encode and Decode modules, there is no guarantee of this equivalence. There may be accidental discrepancies between the shape and the codecs.

One solution to avoid this problem would be to derive codecs from shapes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'd like to do that, but it has a problem: deriving codecs from shapes requires Shape.t to be a GADT w/ witnesses for the algebraic cases – and we don't support type representations for GADTs. This would force us to write out comparisons / pretty-printing / serialisers for Shape.t by hand. Those functions would also have to unfold recursive loops (e.g. with the Unrolling.t monad), so this starts to look quite a lot like defining a third intermediate typed AST that sits between Type.t and (the current) Shape.t – I'm not sure it's worth the effort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if shape was the core of Type.t it would be less of a problem to implement, but it might not be worth the effort indeed.

src/repr/type_binary.mli Show resolved Hide resolved
Comment on lines +145 to +148
(** Deriving shapes from type reps is straightforward, but requires some care
with recursive types. We convert recursive loops to use De Bruijn indexing
by unfolding each one with a fresh placeholder and tracking recursion
depth so that we can replace the placeholders with the right index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand all the magic yet, but the code is interesting to read!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try and make it clearer on a subsequent pass 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all your clarifications!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add digestable witnesses of binary codecs
2 participants