You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In production systems running TictacAAE, especially those using full-sync, there may be situations where a full-sync fails to compare cached trees for the same segments, but repeatedly show no difference between Keys & Clocks (i.e. the exchange returns {clock_compare, 0}).
This is resolved through repair mode - either by rebuilding the tree, or by triggering repair mode via full-sync:
As erlang:phash2/1 is by default a 2 ^ 27 hash space, there is the potential for relatively frequent collisions between a Previous or Current vector-clock Hash which is 0, rather than one where 0 is being used to convey special meaning. There may have been some confusion here with phash/1 which was a hash in the Range beginning 1, so assigning a special meaning to 0 was safe.
So if an update is made to a Key, where the VC previously hashed to 0 - the segment will be incorrect, as it will not remove the hash of the Key in the Update.
All vnodes which receive the same series of updates will be incorrect in the same way - until one of them rebuilds. When an AAE tree is rebuilt, it does not need to calculate the delta hash, and so the rebuilt vnode will now be correct for this segment, and will differ. This will then trigger a series of repairs that will propagate the fix.
2 - In one case the VC is unique-sorted before hashing, not sorted (although the nature of VCs means that usort is equivalent to sort)
In production systems running TictacAAE, especially those using full-sync, there may be situations where a full-sync fails to compare cached trees for the same segments, but repeatedly show no difference between Keys & Clocks (i.e. the exchange returns {clock_compare, 0}).
This is resolved through repair mode - either by rebuilding the tree, or by triggering repair mode via full-sync:
riak_kv/src/riak_kv_ttaaefs_manager.erl
Lines 790 to 805 in b7b690f
But why are the segment hashes wrong in this case. There are two issues, although one is almost certainly a non-issue.
1 - Updating hashtrees assign meaning to a VC which hashes to 0
https://github.com/OpenRiak/kv_index_tictactree/blob/fb1f56d1bef3d207a1a9b2ef181f6f49ee968773/src/aae_controller.erl#L1332-L1335
https://github.com/OpenRiak/kv_index_tictactree/blob/fb1f56d1bef3d207a1a9b2ef181f6f49ee968773/src/aae_controller.erl#L425-L451
As
erlang:phash2/1
is by default a 2 ^ 27 hash space, there is the potential for relatively frequent collisions between a Previous or Current vector-clock Hash which is 0, rather than one where 0 is being used to convey special meaning. There may have been some confusion here with phash/1 which was a hash in the Range beginning 1, so assigning a special meaning to 0 was safe.So if an update is made to a Key, where the VC previously hashed to 0 - the segment will be incorrect, as it will not remove the hash of the Key in the Update.
All vnodes which receive the same series of updates will be incorrect in the same way - until one of them rebuilds. When an AAE tree is rebuilt, it does not need to calculate the delta hash, and so the rebuilt vnode will now be correct for this segment, and will differ. This will then trigger a series of repairs that will propagate the fix.
2 - In one case the VC is unique-sorted before hashing, not sorted (although the nature of VCs means that usort is equivalent to sort)
riak_kv/src/riak_kv_vnode.erl
Lines 3751 to 3759 in b7b690f
The text was updated successfully, but these errors were encountered: