Skip to content

rfc for now() in INSERT INTO #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

gordonguthrie
Copy link
Contributor

No description provided.

{c, 11:00:02}
```

These are out of order.
Copy link

@matthewvon matthewvon Feb 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the problem statement is lacking. There were two problems noted in the email chain. First was arrival order as presented here. The second was key duplication which resulted in data being overwritten. Given that TS is focused upon queries with aggregation functions, the first problem might not be critical. Therefore the second problem could be the focus and performance issues of serialization could be avoided.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a subtopic of key duplication was clock granularity. there have been conversations about using Raspberry Pi's for Riak TS instances. I suggest we document the clock granularity on a Pi too.)


## Convenience Option

In this world `now()` in insert is seen as a convenience for play about and is optimised for onboarding and not production.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the email chain was a comment stating that Erlang's now() guaranteed a unique return value (within that VM). Please note this quote: "erlang:now/0 is deprecated, as it is and will be a scalability bottleneck." That is from http://erlang.org/doc/apps/erts/time_correction.html

@macintux
Copy link
Contributor

macintux commented Feb 3, 2017

Serializing through a common server introduces delays in translation, so now() once again becomes now-ish. Maybe we could reduce the accuracy of the timestamp since millisecond is clearly not in the cards.

The proposal does not guarantee that two inserts arriving on different servers are translated (and thus inserted) in the same order they arrived.

(Or maybe, as MvM points out, ordering isn't all that important.)

@ph07
Copy link

ph07 commented Feb 14, 2017

  1. There are two different use cases - when user data records include a timestamp and when they do not. In the latter case, the user needs a way to "enrich" the incoming records with the timestamp Some people call it "arrival time" to differentiate from "creation time". There are several places in the ingestion pipeline where it can be done, including the microservice which is compiled with Riak client library. If the user decides to "trust" one of database (Riak) servers rather than other servers in the pipeline, it is their architectural design choice. As part of this architecture, the database servers time should be synchronized, e.g. using NTP.
  2. Some of the described scenarios assume that the ordering of the events is important. However, if they did not have a timestamp originally, I can not think of a scenario why the ordering is important.
  3. Running cluster of database servers without proper NTP settings is a severe misconfiguration. Using NTP, the clocks are usually synced within 100 msec or less. In most cases, this would be insignificant compared to the time for data records to travel across the network, especially for IoT use cases.
  4. While we did not hear it before, I assume it is possible that users will not use NTP and the ordering is important. This will require feedback from potential or current customers. If confirmed, we will add it to the backlog and prioritize.

@macintux
Copy link
Contributor

If we implement now() with second resolution, I think most of my concerns go away. If a customer says they need finer-grained resolution to avoid collisions, then they should not be trusting their data to the vagaries of NTP.

@macintux
Copy link
Contributor

Beyond my concern that many people don't monitor NTP properly and thus servers can silently drift out of sync, beyond my concern that 100ms isn't good enough when we're storing 1ms resolution, there's also the fundamental Riak promise: your database can experience partition without data loss.

NTP can't operate across a partition.

@matthewvon
Copy link

@macintux now() with a second resolution assumes that the same device will not post more than one record per second.

@macintux
Copy link
Contributor

macintux commented Feb 14, 2017 via email

@andytill
Copy link
Contributor

To look at the two use cases...

In production, users can use an additional column in the local key. This avoids the overwriting problem. The major problem with now() in INSERT statements is that writes are no longer idempotent, and cannot safely be retried if some rows in a batch fail. For this reason I do not think we should continue with this feature.

For demos, I think creating data is still a problem. I have experienced this difficulty when creating data for the sorting blog and the demo at the London EUG. Having now() or $now there as a convenience would be nice but still doesn't fix that, completely. I'd suggest investigating CSV imports either through the shell command line or via SQL to setup a known environment for demos.

@gordonguthrie
Copy link
Contributor Author

For demos you can use the logging to record a data setup and the replay it into the demo which does solve that problem

@matthewvon
Copy link

"no longer idempotent" for those of us that have been using the mySql now() for logging write/arrival time, we don't care about idempotent. We do care about uniqueness of entries. My production database has been running since 2005 using now(). I and other people are going to expect some equivalent.

@andytill
Copy link
Contributor

The difference is that MySQL supports transactions so if one write fails you can rollback the whole thing. In a Riak TS write failure you're in a halfway house-of-pain, where some writes will have succeeded, others failed but when the batch is retried, all writes are duplicated because now() has changed,

@matthewvon
Copy link

no. you do not understand the use case. I am not talking about transactions. I am talking about plain every day posting of data. Yep, posting twice can double the data. But that is why the posting time, now() is important for manual rollbacks, a.k.a. deletes to clean up.

@macintux
Copy link
Contributor

macintux commented Feb 16, 2017 via email

@matthewvon
Copy link

Andy suggests that we drop the implementation of now() because we do not have transactions and cannot promise idempotent data when data is written twice. I am saying that is short sighted, especially in view of common practices (like I have executing now in my basement and have seen at other places such as Trip Advisor).

@macintux
Copy link
Contributor

macintux commented Feb 16, 2017 via email

@macintux
Copy link
Contributor

macintux commented Feb 16, 2017 via email

andytill pushed a commit that referenced this pull request Feb 28, 2017
* Timestamp usability: timestamp arithmetic and now() function.

* Remove now() used in INSERT statements, this is handled in RFC PR #54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants