-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replay optimized postgres persistor fixes #403
Replay optimized postgres persistor fixes #403
Conversation
The implementation did not match the documentation as no index on `aggregate_id` was created, even when the record class supported this attribute. The index was also not added when indexes where specified explicitly. Now the `aggregate_id` index is always added when supported and for record classes where no indexes are defined it will be the only index (instead of no indexes).
Hashing can result in collissions (same hash value for different input values) so they cannot be reliably used to index the records.
Index fields are now always sorted so we can simply use `==` to see if an index has the same fields as the where clause. Avoid duplicate index lookups by having `find` return `nil` when there is no matching index.
end | ||
end | ||
end.dup | ||
end).dup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this dup
only needs to happen when the records are found in an index. So I propose moving this to Index#find
.
Also, can it just be freeze
and let the caller dup
if required? Or is this too much of a trap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this dup
a bit weird indeed. I can't really recall why this is needed, and can't find anything in old commit messages unfortunately. It has been in there since the beginning. Since it is only the array that is dupped, not the objects in it, I am not sure what it supposed to protect against....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a potential issue in find_records
lib/sequent/core/persistors/replay_optimized_postgres_persistor.rb
Outdated
Show resolved
Hide resolved
end | ||
end | ||
end.dup | ||
end).dup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this dup
a bit weird indeed. I can't really recall why this is needed, and can't find anything in old commit messages unfortunately. It has been in there since the beginning. Since it is only the array that is dupped, not the objects in it, I am not sure what it supposed to protect against....
Simply clear the hash so its configuration (`compare_by_identity`) is preserved.
Structs allow accessing attributes using both with the `[]` operator, so no need to convert strings to symbols first. Extract Symbol to String normalization into a separate method.
The in-memory structs can easily be generated using "ordinary" meta-programming. The only difference is that the struct classes are anonymous since there is no longer a constant referring to the class. This also avoids polluting the Ruby namespace. Also add the missing `eql?` override to ensure the records are stored correctly in the record `Set`.
Since it no longer pollutes the global namespace it no longer needs to be a singleton and can become a simple instance variable.
Use compare_by_identity for the record sets, but still keep the equality/hash overrides for in-memory structs for consistency with ActiveRecord. Remove unnecessary `dup`.
Break up multi-colum indexes into multiple single-attribute indexes. Match a `where-clause` against all indexes and use set-intersection to find the candidate records that will match a where-clause.
Symbols cause less GC and are faster to compare (immutable, same symbol uses same instance).
Also ensure sets use `compare_by_identity`.
record_sets = indexes.flat_map do |field| | ||
if !normalized_where_clause.include? field | ||
[] | ||
def find(record_class, normalized_where_clause) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does normalized_where_clause
mean? Isn't it better to allow any valid where clause and normalize in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or fail if it isn't normalized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an internal API and the normalization happens in the find_records
method.
This fixes two problems in the
ReplayOptimizedPostgresPersistor
:aggregate_id
index is automatically added. Unfortunately the implementation was broken.hash
value of the record keys but did not handle hash collisions at all. So when the hash function collided records could be affected with different keys, causing data corruption.This PR fixes both issues and (tries to) clean up the code.