Replay optimized postgres persistor fixes #403

erikrozendaal · 2024-01-24T17:34:56Z

This fixes two problems in the ReplayOptimizedPostgresPersistor:

The documentation specifies that a default aggregate_id index is automatically added. Unfortunately the implementation was broken.
The index used the hash value of the record keys but did not handle hash collisions at all. So when the hash function collided records could be affected with different keys, causing data corruption.

This PR fixes both issues and (tries to) clean up the code.

The implementation did not match the documentation as no index on `aggregate_id` was created, even when the record class supported this attribute. The index was also not added when indexes where specified explicitly. Now the `aggregate_id` index is always added when supported and for record classes where no indexes are defined it will be the only index (instead of no indexes).

Hashing can result in collissions (same hash value for different input values) so they cannot be reliably used to index the records.

Index fields are now always sorted so we can simply use `==` to see if an index has the same fields as the where clause. Avoid duplicate index lookups by having `find` return `nil` when there is no matching index.

erikrozendaal · 2024-01-25T07:40:23Z

lib/sequent/core/persistors/replay_optimized_postgres_persistor.rb

              end
            end
-          end.dup
+          end).dup


I think this dup only needs to happen when the records are found in an index. So I propose moving this to Index#find.

Also, can it just be freeze and let the caller dup if required? Or is this too much of a trap?

I find this dup a bit weird indeed. I can't really recall why this is needed, and can't find anything in old commit messages unfortunately. It has been in there since the beginning. Since it is only the array that is dupped, not the objects in it, I am not sure what it supposed to protect against....

lvonk

I found a potential issue in find_records

lib/sequent/core/persistors/replay_optimized_postgres_persistor.rb

lvonk · 2024-01-25T08:17:54Z

lib/sequent/core/persistors/replay_optimized_postgres_persistor.rb

              end
            end
-          end.dup
+          end).dup


I find this dup a bit weird indeed. I can't really recall why this is needed, and can't find anything in old commit messages unfortunately. It has been in there since the beginning. Since it is only the array that is dupped, not the objects in it, I am not sure what it supposed to protect against....

Simply clear the hash so its configuration (`compare_by_identity`) is preserved.

Structs allow accessing attributes using both with the `[]` operator, so no need to convert strings to symbols first. Extract Symbol to String normalization into a separate method.

The in-memory structs can easily be generated using "ordinary" meta-programming. The only difference is that the struct classes are anonymous since there is no longer a constant referring to the class. This also avoids polluting the Ruby namespace. Also add the missing `eql?` override to ensure the records are stored correctly in the record `Set`.

Since it no longer pollutes the global namespace it no longer needs to be a singleton and can become a simple instance variable.

Use compare_by_identity for the record sets, but still keep the equality/hash overrides for in-memory structs for consistency with ActiveRecord. Remove unnecessary `dup`.

Break up multi-colum indexes into multiple single-attribute indexes. Match a `where-clause` against all indexes and use set-intersection to find the candidate records that will match a where-clause.

Symbols cause less GC and are faster to compare (immutable, same symbol uses same instance).

Also ensure sets use `compare_by_identity`.

lvonk · 2024-01-31T08:39:00Z

lib/sequent/core/persistors/replay_optimized_postgres_persistor.rb

-            record_sets = indexes.flat_map do |field|
-              if !normalized_where_clause.include? field
-                []
+          def find(record_class, normalized_where_clause)


What does normalized_where_clause mean? Isn't it better to allow any valid where clause and normalize in this method?

Or fail if it isn't normalized

This is an internal API and the normalization happens in the find_records method.

erikrozendaal added 5 commits January 24, 2024 14:00

Add simple performance test to ensure index is used correctly

8a7873a

Indexes should not use hashed keys

5ca8bcd

Hashing can result in collissions (same hash value for different input values) so they cannot be reliably used to index the records.

Optimize index lookup

de4b0f5

Index fields are now always sorted so we can simply use `==` to see if an index has the same fields as the where clause. Avoid duplicate index lookups by having `find` return `nil` when there is no matching index.

Simplify index key generation

6528da5

erikrozendaal requested review from lvonk and mvandiepen January 24, 2024 17:34

erikrozendaal added 2 commits January 24, 2024 18:44

Rubocop Hash#compare_by_identity instead of using object_id

c5d52f7

More (ugly) rubocop fixes

a47d02a

lvonk approved these changes Jan 25, 2024

View reviewed changes

erikrozendaal commented Jan 25, 2024

View reviewed changes

lvonk requested changes Jan 25, 2024

View reviewed changes

erikrozendaal added 13 commits January 25, 2024 11:21

Do not replace the reverse index compare by identity hash

5745993

Simply clear the hash so its configuration (`compare_by_identity`) is preserved.

Remove some unnecessary string to symbol conversion

754685c

Structs allow accessing attributes using both with the `[]` operator, so no need to convert strings to symbols first. Extract Symbol to String normalization into a separate method.

Make the struct cache a normal instance variable

0d97408

Since it no longer pollutes the global namespace it no longer needs to be a singleton and can become a simple instance variable.

Clarified overriding of equality/hash for in-memory structs

614e62f

Store indexed records using sets as well

8334836

Use compare_by_identity for the record sets, but still keep the equality/hash overrides for in-memory structs for consistency with ActiveRecord. Remove unnecessary `dup`.

Fix context scope

3af3926

Compare values indifferently w.r.t. strings and symbols

d891a29

Use one index per indexed attribute

56633ee

Break up multi-colum indexes into multiple single-attribute indexes. Match a `where-clause` against all indexes and use set-intersection to find the candidate records that will match a where-clause.

Use symbols instead of strings as index field names

cf0decb

Store index fields as symbols instead of strings

90dd671

Symbols cause less GC and are faster to compare (immutable, same symbol uses same instance).

Split where-clause into indexed and non-indexed columns

03f4a3f

Optimize record set union and intersection

67f8ff0

Also ensure sets use `compare_by_identity`.

lvonk reviewed Jan 31, 2024

View reviewed changes

erikrozendaal merged commit cb3f51c into master Feb 5, 2024
9 checks passed

erikrozendaal deleted the replay-optimized-postgres-persistor-fixes branch February 5, 2024 09:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replay optimized postgres persistor fixes #403

Replay optimized postgres persistor fixes #403

erikrozendaal commented Jan 24, 2024

erikrozendaal Jan 25, 2024

lvonk Jan 25, 2024

lvonk left a comment

lvonk Jan 25, 2024

lvonk Jan 31, 2024

lvonk Jan 31, 2024

erikrozendaal Feb 1, 2024

Replay optimized postgres persistor fixes #403

Replay optimized postgres persistor fixes #403

Conversation

erikrozendaal commented Jan 24, 2024

erikrozendaal Jan 25, 2024

Choose a reason for hiding this comment

lvonk Jan 25, 2024

Choose a reason for hiding this comment

lvonk left a comment

Choose a reason for hiding this comment

lvonk Jan 25, 2024

Choose a reason for hiding this comment

lvonk Jan 31, 2024

Choose a reason for hiding this comment

lvonk Jan 31, 2024

Choose a reason for hiding this comment

erikrozendaal Feb 1, 2024

Choose a reason for hiding this comment