Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transforming ops with different use of integer/string indexes doesn't work #37

Closed
alecgibson opened this issue Jul 1, 2021 · 12 comments · Fixed by #40 or reedsy/json0#4
Closed

Comments

@alecgibson
Copy link
Contributor

Given an object obj and an array arr, in JavaScript, these statements are true:

arr['123'] === arr[123]
obj['123'] === obj[123]

Therefore, these paths should be equivalent:

var p1 = ['foo', '1']
var p2 = ['foo', 1]

However, json0 doesn't treat them as such. For example, this transformation fails:

const op1 = [{p: ['a', '1', 0], si: 'hi'}]
const op2 = [{p: ['a', 1], lm: 0}]

json0.transform(op1, op2, 'left')

Actual result: [{p: ["a", 2, 0], si: "hi"}]
Expected result: [{p: ["a", 0, 0], si: "hi"}]

@alecgibson
Copy link
Contributor Author

I have a (correctly) failing test case:

it.only 'transforms ops that use strings and numbers differently in their paths', ->
  assert.deepEqual [{p: ["a", 0, 0], si: "hi"}], type.transform [{p: ['a', 1, 0], si: 'hi'}], [{p: ['a', 1], lm: 0}], 'left'
  assert.deepEqual [{p: ["a", 0, 0], si: "hi"}], type.transform [{p: ['a', 1, 0], si: 'hi'}], [{p: ['a', '1'], lm: 0}], 'left'

@alecgibson
Copy link
Contributor Author

I wonder if we should normalize() the path?

@josephg
Copy link
Member

josephg commented Jul 1, 2021

This is by design. json0 has different semantics for lists and objects. transform doesn't have access to the document's type, so it uses at the type of the key (string / number) to disambiguate whether it should follow listy semantics or object semantics during transformation. The caller is expected to know, and be explicit in the call to transform.

Having op1 and op2 with ambiguous key types is UB - if you pass keys ['a', 0] and ['a', '0'] its ambiguous whether the data is shaped like {a: ['']} or {a: {'0': ''}}. If anything, if you mix and match strings and numbers in the operations passed to transform, transform should throw an error.

The second line in your test should throw, because this operation is invalid:

[{p: ['a', '1'], lm: 0}]

The path ['a', '1'] implies the document's shape is {a: {1: _}} and you can't do list moves on an object.

@alecgibson
Copy link
Contributor Author

Hmm. That's a shame. We've had a subtle bug lurking for years where we had accidentally stringified an array index. Everything seems to work just fine for the most-part, because JS lets you do that, and I've only just tracked it down because it only screws up on transform.

The only suggestion I can think of is to extend the normalize() signature to accept the document itself (or add a new normalizeWithDoc()), so that we can sanity-check the op against the doc shape itself.

I have to say I'm slightly confused: why do the key types disambiguate? Shouldn't that be embedded in the semantics of using eg oi vs li?

@alecgibson
Copy link
Contributor Author

(For anyone else who — like me — was thinking you could just coerce any "integer-like" path segment from a string to a number, don't forget that things like ObjectIDs can look like integers!)

@josephg
Copy link
Member

josephg commented Jul 5, 2021

Yeah, the question is do you want list semantics (where items are inserted in between keys, or do you want object semantics - where each update replaces the previous value at that key.

@nornagon might be able to speak more about the state of json0 - it might be possible to make it work in both cases given the operation semantics are named in the object. In json1 this type of thing is a hard requirement for users.

The code should really probably throw errors when people misuse the library like this because of how hard these bugs are to track down. It'd be great to have some PRs to catch these errors so others don't make the same mistake again. (Eg, check the path makes sense in apply() for the given data types).

@nornagon
Copy link
Contributor

nornagon commented Jul 6, 2021

Agree with @josephg that this should throw. Garbage in, garbage out.

@alecgibson
Copy link
Contributor Author

I'm not entirely sure this is "garbage in", since JS treats string and number keys the same, but I guess it's a valid API decision, and if that's the case then yes throwing would be good (although I personally still don't really understand why we can't accept both, given that object vs list semantics are embedded in the op properties).

@alecgibson
Copy link
Contributor Author

alecgibson commented Jul 7, 2021

I think with deeper objects, it's also less clear when to throw, because you might get this sort of situation:

var op1 = {p: ['foo', '1', 'bar', 'baz', 'qux'], oi: 'lorem'}
var op2 = {p: ['foo', 1], lm: 0}

json0.transform(op1, op2, 'left')

In this case, the op that "knows" about the list (and which can trigger a throw) looks "valid", but doesn't share a "common" path with op1, which is a subtler break.

@alecgibson
Copy link
Contributor Author

Or do you mean to just throw in apply() when the path doesn't match the data structure? Should catch the majority of sharedb-related mistakes (since the op won't be submitted in the first place), but might potentially miss bugs where people use transform() independently.

I'm happy to make that change if you want, but I think throwing would technically be breaking like this other change that's in stasis.

@josephg
Copy link
Member

josephg commented Jul 7, 2021

Right, but what are the semantics there? Should it implicitly coerce ['foo', '1'] to ['foo', 1] which transforms to ['foo', 2]? Or should it coerce ['foo', 1] to ['foo', '1'] (then in this case throw?)

The library is written using the explicit rule that string keys imply objects, and number keys imply lists. The transform method needs to know the shape of the objects the operations modify. Its really neat that we can encode that information in the keys themselves using string vs number keys. (And that checks out, since JS objects always have string keys, and lists always have numeric keys). For all that there are some situations they coerce to each other, strings and numbers are different things with different semantics in javascript. They aren't interchangeable and using them interchangably will cause lots of obscure bugs in lots of situations. (Eg "5" + "10" === "510".)

The problem to solve here was how hard it was to find and fix this bug, presumably because of how rarely transform gets called in practice. You have it right - my preferred solution would be to make that bug easier to find by making the apply method throw when the key has the wrong type, instead of subtly working when changes are linear, but breaking when transform is called. But yeah, I agree that potentially breaks compatibility. Maybe it should console.warn the first time that happens or something. People probably won't see the warnings, but at least they'd be there.. :/ 🤷‍♀️

@alecgibson
Copy link
Contributor Author

Right, but what are the semantics there?

Basically I think I was picturing less active coercion, and more "coersive equals" checks. ie the path you submit remains untouched, and is committed to the DB as-is, but anywhere where we check if paths are equal, we use == instead of ===, and if we need to know list vs object semantics, that should be encoded in the op property (unless there are edge cases I'm unaware of, which is more than likely! I guess off the top of my head, path incrementing/decrementing would be broken without an active coercion).

But if that's all too complex and/or just not how you want the library to work, then fine. Given my unfamiliarity with the library, I think I'm going to update our fork to throw (since the mismatch throwing is how I caught this bug in the first place!). I can open a PR to console.warn if that's what you want?

I personally think it would be nice to have some sort of module-level opt-in strict mode that throws (although from a sharedb point-of-view, syncing module options between server and client isn't really something we do yet, although we've vaguely discussed it)

alecgibson added a commit to reedsy/json0 that referenced this issue Jul 7, 2021
Fixes: ottypes#37

# Background

There are some corner cases that arise in the `json0` library because -
given an object `obj`, and an array `arr` -these statements are both
`true` in JavaScript:

```js
obj['123'] === obj[123]
arr['123'] === arr[123]
```

The fact that these statements are true can lead to some unexpected
silent `transform()` failures:

```js
const op1 = [{p: ['a', '1', 0], si: 'hi'}]
const op2 = [{p: ['a', 1], lm: 0}]

json0.transform(op1, op2, 'left')
```

Actual result: `[{p: ["a", 2, 0], si: "hi"}]`
Expected result: `[{p: ["a", 0, 0], si: "hi"}]`

# Solution

In order to prevent this, it's been decided that arrays should *always*
be indexed by `number`, and objects should *always* be indexed by
`string`.

This change enforces stricter type checks when calling `apply()`, and
now throws in the following cases:

 - When a `number` is used to key an object property:
   `type.apply({'1': 'a'}, [{p:[1], od: 'a'}])`
 - When a `string` is used to key an array property:
   `type.apply(['a'], [{p:['0'], ld: 'a'}])`
 - When adding a `string` to a `number`:
   `type.apply(1, [{p:[], na: 'a}])`
 - When adding a `number` to a `string`:
   `type.apply('a', [{p:[], na: 1}])`
 - When applying a string operation to a non-string:
   `type.apply(1, [{p: [0], si: 'a'}])`
alecgibson added a commit to reedsy/json0 that referenced this issue Jul 7, 2021
Fixes: ottypes#37

# Background

There are some corner cases that arise in the `json0` library because -
given an object `obj`, and an array `arr` -these statements are both
`true` in JavaScript:

```js
obj['123'] === obj[123]
arr['123'] === arr[123]
```

The fact that these statements are true can lead to some unexpected
silent `transform()` failures:

```js
const op1 = [{p: ['a', '1', 0], si: 'hi'}]
const op2 = [{p: ['a', 1], lm: 0}]

json0.transform(op1, op2, 'left')
```

Actual result: `[{p: ["a", 2, 0], si: "hi"}]`
Expected result: `[{p: ["a", 0, 0], si: "hi"}]`

# Solution

In order to prevent this, it's been decided that arrays should *always*
be indexed by `number`, and objects should *always* be indexed by
`string`.

This change enforces stricter type checks when calling `apply()`, and
now throws in the following cases:

 - When a `number` is used to key an object property:
   `type.apply({'1': 'a'}, [{p:[1], od: 'a'}])`
 - When a `string` is used to key an array property:
   `type.apply(['a'], [{p:['0'], ld: 'a'}])`
 - When adding a `string` to a `number`:
   `type.apply(1, [{p:[], na: 'a}])`
 - When adding a `number` to a `string`:
   `type.apply('a', [{p:[], na: 1}])`
 - When applying a string operation to a non-string:
   `type.apply(1, [{p: [0], si: 'a'}])`
alecgibson added a commit to share/sharedb that referenced this issue Jul 14, 2021
`json0` recently merged a [breaking change][1] which enforces some
stricter type checking.

Apart from this stricter type checking, no other changes were made, so
any client that only ever submits well-formed ops should be able to
upgrade directly without any trouble.

However, if clients submitted some bad ops (that are no longer allowed
under the stricter checking), then `fetchSnapshot()` will fail when
trying to apply these ops to rebuild the snapshot.

This failure can be surprising if the only "bad" thing about the ops
was that they had [bad path types][2], because at the time when they
were submitted, the snapshot would have been correctly updated.

This change rescues from this particular failure by coercing paths into
the correct type: `number` for arrays, and `string` for objects. Note
that we use the exact same check for arrays and objects as `json0` to
ensure consistency.

Also note that we don't attempt to rescue new ops being submitted,
because these should correctly be rejected.

[1]: ottypes/json0#40
[2]: ottypes/json0#37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants