Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Req: Support for higher resolution timestamps #226

Open
lJoublanc opened this issue Sep 16, 2016 · 6 comments
Open

Feature Req: Support for higher resolution timestamps #226

lJoublanc opened this issue Sep 16, 2016 · 6 comments

Comments

@lJoublanc
Copy link

Is there any plan to support higher-res timestamps than the current 1 millisecond? For example, doesn't mifid 2 have a requirement for higher-res timestamps, for transactional/business data?

I see a couple of issues worth pointing out:

  • MongoDB binary protocol datetime has fixed (milli) resolution. Fortunately this appears to be used only for the START and END fields in each document, rather than the bulk of the data which is numpy binary. Is it used elsewhere?
  • Timestamp indices are a stored in binary, but not using the numpy datetime64 encoding. Instead they're store as deltas of "millis from epoch", if I recall correctly, so this would break.
  • The numpy datetime64 type does support arbitrary precision encoding, but I'm not sure whether this is supported by versionstore or tickstore (see point above - indices are custom encoded, but can you store timestamps in data?)
@jengelman
Copy link

jengelman commented Feb 17, 2018

@bmoscon @lJoublanc Did anyone take a look at/start working on this? I would love to replace some of our local storage system with Arctic, but the millisecond rounding is a fairly big issue. I was thinking of just switching the timestamp indices to nanos from epoch and updating the serializer/deserializers accordingly, but if someone else has a better idea or WIP, please let me know!

Edit: Looks like the current version of Arctic handles microseconds just fine, looking forward to using it!

@lJoublanc
Copy link
Author

No, I haven't had a chance. I will need to re-write a scala adapter soon (next few months), so I may take a look then. I think the way to go is to use Datetime64 which is supposed to be variable resolution (but I can't recall whether the resolution is part of the data-type, or whether you need to store it as a value separately). I remember when I raised this request first, I was looking at this and being puzzled as to why the index was made up of deltas instead of absolute values. There must be a good reason behind it.

@lJoublanc
Copy link
Author

lJoublanc commented Feb 25, 2018

So Datetime64 doesn't explicitly encode the res. A pretty detailed description is available here.
wrt to the deltas in the index, I suspect this is because they're timestamping the ticks themselves (rather than using the exchange-provided timestamp), using HPET, rather than the system clock. This isn't guaranteed to provide absolute time (rather relative time - and I don't even think it's 'time', but rather CPU cycles, which are then divided by CPU freq to work out nanos). So perhaps using timedelta dtype and store the res as an extra bson field (defaulting to ms if it's missing, to preserve backward compat) would work nicely.

@richardbounds
Copy link
Contributor

There's no particular magic in the deltas - I think I tried a few things and the deltas compressed better than storing full timestamps. We don't do anything special with generating the timestamps - it is just System.currentTimeMillis() in the recieving thread (the writing of live tick data is all done in Java). We deliberately don't make any guarantees about clock synchronization across tick streams for different tickers, so the timestamp is just a handy label for locating the interesting section of the event stream.

@jengelman
Copy link

We also store all of our indexes as epoch deltas, so was planning to stick to that anyways. For internal use, I was planning on just altering the conversions in ms_to_datetime and datetime_to_ms, but that would break any existing installs using mili timestamps, so I wanted something more general before submitting a PR. I like the idea of a precision value (and associated switch logic in conversion functions) for the timestamps, but seems expensive (storage wise) to store it for each timestamp. Presumably, you'd be using the same precision for every tick or snapshot or whatever inserted for a given symbol, so how about just adding that to the metadata for the storage engine and then passing that into the conversion functions?

@lJoublanc
Copy link
Author

Presumably, you'd be using the same precision for every tick or snapshot or whatever inserted for a given symbol, so how about just adding that to the metadata for the storage engine and then passing that into the conversion functions?

Yes, sorry that wasn't clear - that's what I meant when I said

So perhaps using timedelta dtype and store the res as an extra bson field (defaulting to ms if it's missing, to preserve backward compat) would work nicely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants