Skip to content
This repository has been archived by the owner on Jun 15, 2023. It is now read-only.

Update the JSON codecs to be able to encode/decode any EJSON values #19

Open
garyb opened this issue Apr 19, 2017 · 16 comments
Open

Update the JSON codecs to be able to encode/decode any EJSON values #19

garyb opened this issue Apr 19, 2017 · 16 comments

Comments

@garyb
Copy link
Member

garyb commented Apr 19, 2017

Currently we have silly things like arbitraryJsonEncodableEJsonOfSize, which just wasted about half an hour of @kritzcreek's and my time. 😢

@wemrysi this is relevant to our discussion about SST EJSON encoding the other day - I'm not sure at the moment what the encoding of non-string-key maps is supposed to look like. I asked @jdegoes about it in #general a while back, and I think the idea was the encoding would be something like { $map: [{ $key: ..., $value: ... }] }, but I don't know if that is already the case or if it was a proposal for how it should be done. Can you take a look and let us know so we can update this accordingly? I tried to find it myself but was a bit lost. 😄

@wemrysi
Copy link

wemrysi commented Apr 19, 2017

@garyb There isn't any spec that I know of, @sellout and I came up with the following encoding last week:

All JSON compatible bits will pass through untouched and extension parts will be encoded as

Meta -> { $value: ..., $meta: ...}
Map  -> { $map: [{$key: ..., $value: ...}, ...] }
Byte -> { $byte: 42 }
Char -> { $char: "x" }
Int  -> { $int: 2345 }

we had also considered encoding maps like { $map: [[<key>, <value>], ...] }. Do you think one is preferable to the other?

If a Map only contains String keys, it will be encoded as a JSON object with encoded EJson values.

Also, when encoding EJson values, any map keys in the source that begin with a $ need to be prefixed with an additional $ (i.e. $foo becomes $$foo and $$bar becomes $$$bar) which then needs to be stripped off when decoding.

What do you think?

@jdegoes
Copy link

jdegoes commented Apr 19, 2017

Also, when encoding EJson values, any map keys in the source that collide with any of the sentinel keys ($value, $meta, $map, etc) will have an extra $ prefixed to them i.e. $$map which needs to be removed during decoding.

Why is this necessary? It should be unambiguous if you are losslessly encoding information at every level of the hierarchy.

@jdegoes
Copy link

jdegoes commented Apr 19, 2017

Oh, I see why, this:

All JSON compatible bits will pass through untouched and extension parts will be encoded as

The alternative is to encode these as well, e.g. $string. This would be unambiguous.

@wemrysi
Copy link

wemrysi commented Apr 19, 2017

@jdegoes Indeed, we could do that instead. The thought behind the proposed encoding is that treating the extended parts as a strict super set of JSON means that we can roundtrip JSON values through the encoding unchanged.

I should also add that a Map consisting only of stringly-keyed values will be encoded as a regular JSON object.

@jdegoes
Copy link

jdegoes commented Apr 19, 2017

@wemrysi Either way is fine with me, and you've probably already thought about this, but if a key is $$map, for example, then the above simple scheme will break down, because during decoding you would interpret that as the string $map. Basically if the key isn't one of the magical ones, you'll have to add / strip off exactly one $, or some such, in order to support the full range of possible keys.

@wemrysi
Copy link

wemrysi commented Apr 19, 2017

@jdegoes Ah, yes, you're right. I think if a key begins with a $, then we must add a $ when encoding and strip one when decoding if it isn't one of the magical ones..

@sellout
Copy link

sellout commented Apr 19, 2017

@jdegoes Yeah, not encoding the compatible parts for a few reasons:

  • compact representation;
  • human-readability (which is also the reason I was considering the pair-like encoding of "$map"); and
  • compatibility with JSON (kind of like C/C++) – other than the "$$…" edge case[†], interpreting a JSON value as if were JSON-encoded EJson will behave.

The last point is because I imagine we can end up with mixed JSON/EJson collections and so that means we will always have to guess in some cases (since JSON-encoded EJson has to be parsable as valid JSON).

Oh, but this raises a counterpoint for always encoding (like { "$string": "foo" }), which is that another difference between JSON and EJson is that "foo" (as an entire “document”) is valid EJson, but not valid JSON, which requires only objects at the top level. So we must encode even simple values if we want to be able to represent those correctly. Although we could encode those all as something like { "$top": "foo" }, { "$top": [1, 2, 3] }, etc.

[†]: Because of the "$$…" case, I was thinking that the sigil should be something less likely to show up in actual JSON files. E.g., JSON requires Unicode, IIRC, so we could use "∫map" (not that I have any reason for choosing integral in particular).

@garyb
Copy link
Member Author

garyb commented Apr 19, 2017

@wemrysi the proposed encoding looks great to me! I have no particular preference on using an array/object for key-value pairs.

Would this replace the current "precise" encoding? We've actually just dropped some of the type information (for dates, etc) from our EJsonF, so if we're moving all that to Meta now it aligns better with how we'd want to do that anyway 👍

@sellout
Copy link

sellout commented Apr 19, 2017

@garyb I was thinking that this would be something like application/json;mode=ejson and we’d deprecate the precise mode. And we’ll also add application/ejson at some point, but as @wemrysi mentioned to me, it probably won’t be taken advantage of by Web clients, since JSON parsing is optimized in the browser and so String => JSON; JSON => EJson is probably more efficient than String => EJson.

@garyb
Copy link
Member Author

garyb commented Apr 19, 2017

Ok great. And yeah, I think when there is an application/ejson we can try parsing it directly and benchmark the two, but I'm fairly confident the via-JSON approach will be faster for JS runtimes.

@wemrysi
Copy link

wemrysi commented Apr 20, 2017

[†]: Because of the "$$…" case, I was thinking that the sigil should be something less likely to show up in actual JSON files. E.g., JSON requires Unicode, IIRC, so we could use "∫map" (not that I have any reason for choosing integral in particular).

Some candidates might be, mapₑ, ∃map, ∑map (I was looking for sigils that evoke the letter e for extended).

Any preferences?

@wemrysi
Copy link

wemrysi commented Apr 21, 2017

@garyb Per @sellout's suggestion, I'm using as the sigil instead of $, let me know if that's a problem or if you'd prefer something else.

@garyb
Copy link
Member Author

garyb commented Apr 21, 2017

sounds good to me, JSON.parse and JSON.stringify have no problem with it, so 👍

@garyb
Copy link
Member Author

garyb commented Jun 12, 2017

@wemrysi do you know where this is up to? Is there an Content-Type we can use for this yet? :)

@wemrysi
Copy link

wemrysi commented Jun 12, 2017

@garyb There is not, the encoding is implemented in Quasar, but we ended up not exposing it in the REST api and used the existing precise encoding for SSTs (which was the immediate need for possibly implementing this) for consistency. We have an initiative in Quasar to switch over to EJSON but it hasn't seen much progress yet.

@garyb
Copy link
Member Author

garyb commented Jun 12, 2017

Ok, thanks. precise it is then for now!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

4 participants