Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add arrow/parquet as alternative in Chapter 5 #292

Open
engineerchange opened this issue Oct 24, 2020 · 4 comments
Open

add arrow/parquet as alternative in Chapter 5 #292

engineerchange opened this issue Oct 24, 2020 · 4 comments

Comments

@engineerchange
Copy link
Contributor

Saw a debate on using feather vs. arrow's parquet online, and it seems like it is a viable alternate in efficiency and worth benchmarking in Chapter 5: Efficient input/output.

https://ursalabs.org/blog/2019-10-columnar-perf/

@Robinlovelace
Copy link
Collaborator

Robinlovelace commented Oct 24, 2020

Hi @engineerchange, first I'd like to say: many thanks for keeping this repo lively, your agitating for more computationally efficient implementations is greatly appreciated from the perspective of updating the book (and perhaps from the perspective of engineering positive change in the world beyond computing)!

Have you seen any benchmarks comparing parquet vs vroom, and do you know if the R implementation, which seems slower than the Python implementation in Wes's tests, has sped up?

For me the only question is 'when' not 'if': has the R implementation reached a sufficient level of maturity to be worthy of inclusion in the book? On a related note I'd like to add duckdb to the book, assuming it's ready.

@engineerchange
Copy link
Contributor Author

That's a very kind note from you - I use your book as a cheatsheet like most, so I am happy to hear that my agitations are appreciated! 😅

I don't have many answers here except to suggest it as an option. I struggle to know when a package has reached a level of maturity that would be appropriate for a publication like this. I think this effort would be a good way to document some benchmarks, however.

Yeah, duckdb is quickly moving into ⭐ status in the R world, and including it is probably a good idea. I was poking around with it a bit this weekend; and I think it's likely the best way to introduce SQL to an R user, and likely to someone brand new to coding.

@Robinlovelace
Copy link
Collaborator

Robinlovelace commented Oct 24, 2020

Fantastic. Well... in the interests of keeping our giant 'cheat sheet' up-to-date, any further comments and especially suggested changes via PRs, are very welcome ;)

Robinlovelace added a commit that referenced this issue Oct 25, 2020
@Robinlovelace
Copy link
Collaborator

Heads-up @engineerchange I've created this PR that aims to compare vroom and arrow options: #293

Work in progress, comments on or additions to that welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants