-
Notifications
You must be signed in to change notification settings - Fork 1
rfc for now() in INSERT INTO #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
{c, 11:00:02} | ||
``` | ||
|
||
These are out of order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the problem statement is lacking. There were two problems noted in the email chain. First was arrival order as presented here. The second was key duplication which resulted in data being overwritten. Given that TS is focused upon queries with aggregation functions, the first problem might not be critical. Therefore the second problem could be the focus and performance issues of serialization could be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a subtopic of key duplication was clock granularity. there have been conversations about using Raspberry Pi's for Riak TS instances. I suggest we document the clock granularity on a Pi too.)
|
||
## Convenience Option | ||
|
||
In this world `now()` in insert is seen as a convenience for play about and is optimised for onboarding and not production. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the email chain was a comment stating that Erlang's now() guaranteed a unique return value (within that VM). Please note this quote: "erlang:now/0 is deprecated, as it is and will be a scalability bottleneck." That is from http://erlang.org/doc/apps/erts/time_correction.html
Serializing through a common server introduces delays in translation, so now() once again becomes now-ish. Maybe we could reduce the accuracy of the timestamp since millisecond is clearly not in the cards. The proposal does not guarantee that two inserts arriving on different servers are translated (and thus inserted) in the same order they arrived. (Or maybe, as MvM points out, ordering isn't all that important.) |
|
If we implement |
Beyond my concern that many people don't monitor NTP properly and thus servers can silently drift out of sync, beyond my concern that 100ms isn't good enough when we're storing 1ms resolution, there's also the fundamental Riak promise: your database can experience partition without data loss. NTP can't operate across a partition. |
@macintux now() with a second resolution assumes that the same device will not post more than one record per second. |
I addressed that point in my diatribe.
…Sent from my iPhone
On Feb 14, 2017, at 3:57 PM, Matthew Von-Maszewski ***@***.***> wrote:
@macintux now() with a second resolution assumes that the same device will not post more than one record per second.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
To look at the two use cases... In production, users can use an additional column in the local key. This avoids the overwriting problem. The major problem with For demos, I think creating data is still a problem. I have experienced this difficulty when creating data for the sorting blog and the demo at the London EUG. Having |
For demos you can use the logging to record a data setup and the replay it into the demo which does solve that problem |
"no longer idempotent" for those of us that have been using the mySql now() for logging write/arrival time, we don't care about idempotent. We do care about uniqueness of entries. My production database has been running since 2005 using now(). I and other people are going to expect some equivalent. |
The difference is that MySQL supports transactions so if one write fails you can rollback the whole thing. In a Riak TS write failure you're in a halfway house-of-pain, where some writes will have succeeded, others failed but when the batch is retried, all writes are duplicated because |
no. you do not understand the use case. I am not talking about transactions. I am talking about plain every day posting of data. Yep, posting twice can double the data. But that is why the posting time, now() is important for manual rollbacks, a.k.a. deletes to clean up. |
Can you elaborate? I don’t understand this point at all.
…-John
On Feb 16, 2017, at 10:00 AM, Matthew Von-Maszewski ***@***.***> wrote:
no. you do not understand the use case. I am not talking about transactions. I am talking about plain every day posting of data. Yep, posting twice can double the data. But that is why the posting time, now() is important for manual rollbacks, a.k.a. deletes to clean up.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#54 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACCsXmY6DoHUQga9_-6eETmNcj4Qm1Cqks5rdGSPgaJpZM4L2VNz>.
|
Andy suggests that we drop the implementation of now() because we do not have transactions and cannot promise idempotent data when data is written twice. I am saying that is short sighted, especially in view of common practices (like I have executing now in my basement and have seen at other places such as Trip Advisor). |
I understood the broader point.
This confused me:
Yep, posting twice can double the data. But that is why the posting time, now() is important for manual rollbacks, a.k.a. deletes to clean up.
I’ve argued that now() is effectively unpredictable in a clustered database. How are you going to clean up your data using it? How do you know how many duplicate (for all other fields) rows are redundant, since a sensor could easily feed you the same data set multiple times? You don’t really even know how many times the same row has been inserted, given Riak’s inability to deterministically reject a write which “failed” due to timeouts somewhere in the environment.
…-John
On Feb 16, 2017, at 10:10 AM, Matthew Von-Maszewski ***@***.***> wrote:
Andy suggests that we drop the implementation of now() because we do not have transactions and cannot promise idempotent data when data is written twice. I am saying that is short sighted, especially in view of common practices (like I have executing now in my basement and have seen at other places such as Trip Advisor).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#54 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACCsXuO_vQwhWAGVkN_IgBQC7tw0dVQaks5rdGbogaJpZM4L2VNz>.
|
Ok, a concrete scenario to help me understand cleanup via now(). Events are ordered by objective time (for a reasonably non-Einsteinian definition thereof)
Add a row. It maybe failed
Independently of that, add a row with the same data because the sensor gave the same readings. Maybe this arrives at the same server, maybe not.
Repeat #2 some number of times
Successfully re-add the row that maybe failed.
At this point, we have multiple rows that are identical except for the timestamp. How does the timestamp help us determine the correct data set that should be retained? We don’t even know the objective count.
…-John
|
* Timestamp usability: timestamp arithmetic and now() function. * Remove now() used in INSERT statements, this is handled in RFC PR #54
No description provided.