Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make datasets more accessible #241

Open
MrPowers opened this issue Dec 29, 2021 · 4 comments
Open

Make datasets more accessible #241

MrPowers opened this issue Dec 29, 2021 · 4 comments

Comments

@MrPowers
Copy link

Thanks for the excellent work on this project.

I'd like to experiment with the datasets and would rather not have to generate the datasets myself. I've never used R and don't really want to learn at this moment. I'm more interested in looking at stuff like if using broadcast joins would materially impact the Spark benchmarks.

Can you provide downloadable data files? Or can you make the files accessible on S3? I'm making important data files accessible to the community in a S3 bucket, so I'd also be happy to upload them there if that'd help.

Thanks again for building / maintaining this project. Hope I'll be able to contribute!

@ncclementi
Copy link

Hi there, checking in here, is there any update on having the data files available on an S3 bucket? I'd really appreciate it, especially for the 1e9 case which seems to have problems to create see #110

Thank you
cc: @jangorecki

@MrPowers
Copy link
Author

We could make the 50 GB accessible in S3 via multiple gzipped files that users could download and reassemble on their local machines too. That'd let uses download the file in parallel from S3 and limit the massive file problem. Thoughts @jangorecki / @ncclementi?

@jangorecki
Copy link
Contributor

Hi, you need to contact h2o support. I am no longer maintainer of the project.

@MrPowers
Copy link
Author

ok @jangorecki, will do. Thanks for your great contributions on this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants