[loader] datatable #1785
geekscrapy
started this conversation in
Ideas
Replies: 2 comments
-
Interesting idea, @geekscrapy! I'll be looking into Ibis to achieve this as well. |
Beta Was this translation helpful? Give feedback.
0 replies
-
See also: duckdb |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Datatable is much like pandas but with a focus on big data and speed. Is this a package that might help vd with big datasets?
https://datatable.readthedocs.io/en/latest/index.html
I'd love to see this this implemented as a "bare" loader. By this I mean user functionality is limited to only what is capable by the datatable library. So, in this case, it can do grouping, it can do searching, etc. it can even do regex searching, but it can't do things like splitcol, regex capture etc.
I say this as these are the types of features (I think!) that tend to slow down calculations as the data gets chopped up. I'd love to see how a "native" version of a loader might work, if only it's core functionality is used, and how fast it could be (albeit losing functionality, but then, if you need the extra features, you could probably just do a deepcopy?).
I also see this approach helping with diff saving potentially.
Anyhow, thoughts! I've looked into how feasible it would be, but it's blue sky (big data) thinking 🙃
Speed tests against pandas:
https://towardsdatascience.com/an-overview-of-pythons-datatable-package-5d3a97394ee9
Beta Was this translation helpful? Give feedback.
All reactions