rfc for now() in INSERT INTO #54

gordonguthrie · 2017-02-03T13:37:40Z

No description provided.

matthewvon · 2017-02-03T14:12:32Z

Options_for_a_now_function/README.md

+{c, 11:00:02}
+```
+
+These are out of order.


I believe the problem statement is lacking. There were two problems noted in the email chain. First was arrival order as presented here. The second was key duplication which resulted in data being overwritten. Given that TS is focused upon queries with aggregation functions, the first problem might not be critical. Therefore the second problem could be the focus and performance issues of serialization could be avoided.

(a subtopic of key duplication was clock granularity. there have been conversations about using Raspberry Pi's for Riak TS instances. I suggest we document the clock granularity on a Pi too.)

matthewvon · 2017-02-03T14:16:21Z

Options_for_a_now_function/README.md

+
+## Convenience Option
+
+In this world `now()` in insert is seen as a convenience for play about and is optimised for onboarding and not production.


Part of the email chain was a comment stating that Erlang's now() guaranteed a unique return value (within that VM). Please note this quote: "erlang:now/0 is deprecated, as it is and will be a scalability bottleneck." That is from http://erlang.org/doc/apps/erts/time_correction.html

macintux · 2017-02-03T15:30:13Z

Serializing through a common server introduces delays in translation, so now() once again becomes now-ish. Maybe we could reduce the accuracy of the timestamp since millisecond is clearly not in the cards.

The proposal does not guarantee that two inserts arriving on different servers are translated (and thus inserted) in the same order they arrived.

(Or maybe, as MvM points out, ordering isn't all that important.)

ph07 · 2017-02-14T07:19:12Z

There are two different use cases - when user data records include a timestamp and when they do not. In the latter case, the user needs a way to "enrich" the incoming records with the timestamp Some people call it "arrival time" to differentiate from "creation time". There are several places in the ingestion pipeline where it can be done, including the microservice which is compiled with Riak client library. If the user decides to "trust" one of database (Riak) servers rather than other servers in the pipeline, it is their architectural design choice. As part of this architecture, the database servers time should be synchronized, e.g. using NTP.
Some of the described scenarios assume that the ordering of the events is important. However, if they did not have a timestamp originally, I can not think of a scenario why the ordering is important.
Running cluster of database servers without proper NTP settings is a severe misconfiguration. Using NTP, the clocks are usually synced within 100 msec or less. In most cases, this would be insignificant compared to the time for data records to travel across the network, especially for IoT use cases.
While we did not hear it before, I assume it is possible that users will not use NTP and the ordering is important. This will require feedback from potential or current customers. If confirmed, we will add it to the backlog and prioritize.

macintux · 2017-02-14T20:49:50Z

If we implement now() with second resolution, I think most of my concerns go away. If a customer says they need finer-grained resolution to avoid collisions, then they should not be trusting their data to the vagaries of NTP.

macintux · 2017-02-14T20:52:04Z

Beyond my concern that many people don't monitor NTP properly and thus servers can silently drift out of sync, beyond my concern that 100ms isn't good enough when we're storing 1ms resolution, there's also the fundamental Riak promise: your database can experience partition without data loss.

NTP can't operate across a partition.

matthewvon · 2017-02-14T20:57:15Z

@macintux now() with a second resolution assumes that the same device will not post more than one record per second.

macintux · 2017-02-14T20:59:40Z

I addressed that point in my diatribe.

…

Sent from my iPhone

On Feb 14, 2017, at 3:57 PM, Matthew Von-Maszewski ***@***.***> wrote: @macintux now() with a second resolution assumes that the same device will not post more than one record per second. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

andytill · 2017-02-16T11:33:47Z

To look at the two use cases...

In production, users can use an additional column in the local key. This avoids the overwriting problem. The major problem with now() in INSERT statements is that writes are no longer idempotent, and cannot safely be retried if some rows in a batch fail. For this reason I do not think we should continue with this feature.

For demos, I think creating data is still a problem. I have experienced this difficulty when creating data for the sorting blog and the demo at the London EUG. Having now() or $now there as a convenience would be nice but still doesn't fix that, completely. I'd suggest investigating CSV imports either through the shell command line or via SQL to setup a known environment for demos.

gordonguthrie · 2017-02-16T11:41:00Z

For demos you can use the logging to record a data setup and the replay it into the demo which does solve that problem

matthewvon · 2017-02-16T14:37:08Z

"no longer idempotent" for those of us that have been using the mySql now() for logging write/arrival time, we don't care about idempotent. We do care about uniqueness of entries. My production database has been running since 2005 using now(). I and other people are going to expect some equivalent.

andytill · 2017-02-16T14:43:35Z

The difference is that MySQL supports transactions so if one write fails you can rollback the whole thing. In a Riak TS write failure you're in a halfway house-of-pain, where some writes will have succeeded, others failed but when the batch is retried, all writes are duplicated because now() has changed,

matthewvon · 2017-02-16T15:00:27Z

no. you do not understand the use case. I am not talking about transactions. I am talking about plain every day posting of data. Yep, posting twice can double the data. But that is why the posting time, now() is important for manual rollbacks, a.k.a. deletes to clean up.

macintux · 2017-02-16T15:06:20Z

Can you elaborate? I don’t understand this point at all.

…

-John

On Feb 16, 2017, at 10:00 AM, Matthew Von-Maszewski ***@***.***> wrote: no. you do not understand the use case. I am not talking about transactions. I am talking about plain every day posting of data. Yep, posting twice can double the data. But that is why the posting time, now() is important for manual rollbacks, a.k.a. deletes to clean up. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#54 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACCsXmY6DoHUQga9_-6eETmNcj4Qm1Cqks5rdGSPgaJpZM4L2VNz>.

matthewvon · 2017-02-16T15:10:30Z

Andy suggests that we drop the implementation of now() because we do not have transactions and cannot promise idempotent data when data is written twice. I am saying that is short sighted, especially in view of common practices (like I have executing now in my basement and have seen at other places such as Trip Advisor).

macintux · 2017-02-16T15:16:02Z

I understood the broader point. This confused me:

Yep, posting twice can double the data. But that is why the posting time, now() is important for manual rollbacks, a.k.a. deletes to clean up.

I’ve argued that now() is effectively unpredictable in a clustered database. How are you going to clean up your data using it? How do you know how many duplicate (for all other fields) rows are redundant, since a sensor could easily feed you the same data set multiple times? You don’t really even know how many times the same row has been inserted, given Riak’s inability to deterministically reject a write which “failed” due to timeouts somewhere in the environment.

…

-John

On Feb 16, 2017, at 10:10 AM, Matthew Von-Maszewski ***@***.***> wrote: Andy suggests that we drop the implementation of now() because we do not have transactions and cannot promise idempotent data when data is written twice. I am saying that is short sighted, especially in view of common practices (like I have executing now in my basement and have seen at other places such as Trip Advisor). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#54 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACCsXuO_vQwhWAGVkN_IgBQC7tw0dVQaks5rdGbogaJpZM4L2VNz>.

macintux · 2017-02-16T15:25:49Z

Ok, a concrete scenario to help me understand cleanup via now(). Events are ordered by objective time (for a reasonably non-Einsteinian definition thereof) Add a row. It maybe failed Independently of that, add a row with the same data because the sensor gave the same readings. Maybe this arrives at the same server, maybe not. Repeat #2 some number of times Successfully re-add the row that maybe failed. At this point, we have multiple rows that are identical except for the timestamp. How does the timestamp help us determine the correct data set that should be retained? We don’t even know the objective count.

…

-John

* Timestamp usability: timestamp arithmetic and now() function. * Remove now() used in INSERT statements, this is handled in RFC PR #54

Gordon Guthrie added 2 commits February 3, 2017 13:30

rfc for now() in INSERT INTO

bf2ce85

minor corrections

2b66bb8

matthewvon reviewed Feb 3, 2017

View reviewed changes

andytill pushed a commit that referenced this pull request Feb 20, 2017

Remove now() used in INSERT statements, this is handled in RFC PR #54

e4fd7e3

andytill pushed a commit that referenced this pull request Feb 28, 2017

Timestamp usability: timestamp arithmetic and now() function. (#51)

2adccc0

* Timestamp usability: timestamp arithmetic and now() function. * Remove now() used in INSERT statements, this is handled in RFC PR #54


		## Convenience Option

		In this world `now()` in insert is seen as a convenience for play about and is optimised for onboarding and not production.

rfc for now() in INSERT INTO #54

Are you sure you want to change the base?

rfc for now() in INSERT INTO #54

Uh oh!

Conversation

gordonguthrie commented Feb 3, 2017

Uh oh!

matthewvon Feb 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewvon Feb 3, 2017

Choose a reason for hiding this comment

Uh oh!

matthewvon Feb 3, 2017

Choose a reason for hiding this comment

Uh oh!

macintux commented Feb 3, 2017

Uh oh!

ph07 commented Feb 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

macintux commented Feb 14, 2017

Uh oh!

macintux commented Feb 14, 2017

Uh oh!

matthewvon commented Feb 14, 2017

Uh oh!

macintux commented Feb 14, 2017 via email

Uh oh!

andytill commented Feb 16, 2017

Uh oh!

gordonguthrie commented Feb 16, 2017

Uh oh!

matthewvon commented Feb 16, 2017

Uh oh!

andytill commented Feb 16, 2017

Uh oh!

matthewvon commented Feb 16, 2017

Uh oh!

macintux commented Feb 16, 2017 via email

Uh oh!

matthewvon commented Feb 16, 2017

Uh oh!

macintux commented Feb 16, 2017 via email

Uh oh!

macintux commented Feb 16, 2017 via email

Uh oh!

Uh oh!

matthewvon Feb 3, 2017 •

edited

Loading

ph07 commented Feb 14, 2017 •

edited

Loading