Insert should reject repetitive inserts with same key #2

tzolov · 2017-03-22T15:17:41Z

Existing JournalledInsertRule implementation is designed for high throughput and minimal overhead.
Rule converts INSERT on virtual table into INSERT on the target journal table. In practice the performance of the SQL-Rewriter INSERT statement should match this of a standard HAWQ Insert operation.

The disadvantage of current design is that it will not prevent repetitive INSERTS on of rows with same-key.
This can be resolved by Insert rule implementation that uses INSERT/SELECT semantics similar to this of the Update and Delete rules. The disadvantage of this approach is the increased performance overhead.

As a trade off i can keep both rules and allow the User to configure the one or the other depends on the use case

The text was updated successfully, but these errors were encountered:

davidje13 · 2017-03-22T15:43:09Z

Note that this issue only affects use-cases where the primary key is being given an explicit value (rather than relying on an autoincrementing index, which of course prevents colliding IDs). Also any systems which generate UUIDs should be OK (ignoring type-4-collisions at least).

Also note that this issue is different depending on the choice of version column:

TIMESTAMP will allow duplicate items with the same primary key
BIGINT will block duplicate items (unless old data is archived away), but will also block INSERT-DELETE-INSERT (i.e. once an ID has been taken, it will not be available again even after deleting)

If a safer implementation is added, I would suggest: (this reasoning is for TIMESTAMP; I haven't given BIGINT much thought)

Add a configuration option (maybe journalled_insert_mode or similar) with 3 values; something like: "fast", "balanced" and "safe". Set to "balanced" by default.
When an INSERT is detected:
- If "fast", always use existing fast method
- If "safe", always use new safe method
- If "balanced", use new safe method if one of the primary key columns has been given an explicit value, and fast method otherwise

Note that "balanced" is likely to be best for most use cases, but will not catch cases such as:

auto-incrementing ID adds "4", "5"
manual query adds "7" (safe insert used; deemed OK)
auto-incrementing ID adds "6", "7" (fast insert used; oops)

Which is what "safe" would exist for.

For creating the new functionality, there are some existing tests for the insert rule in insertIntegrationTest.java; look for and remove the Assume.assumeTrue lines to enable them (currently one of the tests is enabled only for TIMESTAMP, and the other only for BIGINT, due to the different behaviour described above)

tzolov added the enhancement label Mar 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insert should reject repetitive inserts with same key #2

Insert should reject repetitive inserts with same key #2

tzolov commented Mar 22, 2017

davidje13 commented Mar 22, 2017

Insert should reject repetitive inserts with same key #2

Insert should reject repetitive inserts with same key #2

Comments

tzolov commented Mar 22, 2017

davidje13 commented Mar 22, 2017