Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible improvement of json.RawMessage instead of LexiconTypeDecoder for unknown data #773

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bnewbold
Copy link
Collaborator

@bnewbold bnewbold commented Oct 4, 2024

The indigo codebase supports a couple areas of use:

  • generic atproto infrastructure code, like the relay, which doesn't care about record schemas
  • application code, which does care about application schemas

When writing application code, there is an expectation that all the relevant Lexicon types are known and have pre-generated golang struct types. In that context, LexiconTypeDecoder as a wrapper mostly makes sense. There are some issues with round-tripping (decoding and re-encoding doesn't always yield the same data, if there are new or unexpected fields), but we can ignore that for now.

For the later situation, LexiconTypeDecoder doesn't work. If we are only dealing with CBOR blocks (like the relay or raw repo manipulation), it doesn't really matter, can just just avoid serialization, or use the atproto/data package for JSON/CBOR round-tripping. But sometimes we are dealing with HTTP APIs and the Lexicon-generated API helpers, and we need to deal with data of unknown type. This can be unknown, or it can be open unions. In this case, LexiconTypeDecoder doesn't really work.

This PR proposes one way of handling unknown: using json.RawMessage (which is basically just []byte under the hood). This allows passing arbitrary data around loss-less.

A related approach is: #485

Having tried this approach, I think something like the above might be better: make it so LexiconTypeDecoder can deal with unknown types.

A related issue we'll need to address is "open unions" when we don't know the type. Maybe we always have an "Other" field in the enum struct and stuff LexiconTypeDecoder in there.

bluesky-social/atproto#2860

Base automatically changed from bnewbold/lexgen-20241003 to main October 4, 2024 16:34
}

ops = append(ops, op)
case w.RepoApplyWrites_Update != nil:
u := w.RepoApplyWrites_Update

cc, err := r.PutRecord(ctx, u.Collection+"/"+u.Rkey, u.Value.Val)
var ltd lexutil.LexiconTypeDecoder
if err = ltd.UnmarshalJSON(*u.Value); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this adding a marshalling step in repomanager where we didn't have one before? Wondering if there will be performance impacts on this cause this is critical path on relay write-path iirc.

@@ -36,8 +37,12 @@ func FetchAndProcessRecord(ctx context.Context, eng *automod.Engine, aturi synta
return fmt.Errorf("expected a CID in getRecord response")
}
recCID := syntax.CID(*out.Cid)
var ltd lexutil.LexiconTypeDecoder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to use util.JsonDecodeValue()
maybe make a utility function that returns (cbg.CBORMarshaler, error) if we want better than (any, error) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants