You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Existing JournalledInsertRule implementation is designed for high throughput and minimal overhead.
Rule converts INSERT on virtual table into INSERT on the target journal table. In practice the performance of the SQL-Rewriter INSERT statement should match this of a standard HAWQ Insert operation.
The disadvantage of current design is that it will not prevent repetitive INSERTS on of rows with same-key.
This can be resolved by Insert rule implementation that uses INSERT/SELECT semantics similar to this of the Update and Delete rules. The disadvantage of this approach is the increased performance overhead.
As a trade off i can keep both rules and allow the User to configure the one or the other depends on the use case
The text was updated successfully, but these errors were encountered:
Note that this issue only affects use-cases where the primary key is being given an explicit value (rather than relying on an autoincrementing index, which of course prevents colliding IDs). Also any systems which generate UUIDs should be OK (ignoring type-4-collisions at least).
Also note that this issue is different depending on the choice of version column:
TIMESTAMP will allow duplicate items with the same primary key
BIGINT will block duplicate items (unless old data is archived away), but will also block INSERT-DELETE-INSERT (i.e. once an ID has been taken, it will not be available again even after deleting)
If a safer implementation is added, I would suggest: (this reasoning is for TIMESTAMP; I haven't given BIGINT much thought)
Add a configuration option (maybe journalled_insert_mode or similar) with 3 values; something like: "fast", "balanced" and "safe". Set to "balanced" by default.
When an INSERT is detected:
If "fast", always use existing fast method
If "safe", always use new safe method
If "balanced", use new safe method if one of the primary key columns has been given an explicit value, and fast method otherwise
Note that "balanced" is likely to be best for most use cases, but will not catch cases such as:
auto-incrementing ID adds "6", "7" (fast insert used; oops)
Which is what "safe" would exist for.
For creating the new functionality, there are some existing tests for the insert rule in insertIntegrationTest.java; look for and remove the Assume.assumeTrue lines to enable them (currently one of the tests is enabled only for TIMESTAMP, and the other only for BIGINT, due to the different behaviour described above)
Existing JournalledInsertRule implementation is designed for high throughput and minimal overhead.
Rule converts INSERT on
virtual table
into INSERT on the targetjournal table
. In practice the performance of the SQL-Rewriter INSERT statement should match this of a standard HAWQ Insert operation.The disadvantage of current design is that it will not prevent repetitive INSERTS on of rows with same-key.
This can be resolved by Insert rule implementation that uses INSERT/SELECT semantics similar to this of the Update and Delete rules. The disadvantage of this approach is the increased performance overhead.
As a trade off i can keep both rules and allow the User to configure the one or the other depends on the use case
The text was updated successfully, but these errors were encountered: