Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart get_object() #383

Open
1 of 3 tasks
gdbassett opened this issue Jan 5, 2021 · 0 comments
Open
1 of 3 tasks

Multipart get_object() #383

gdbassett opened this issue Jan 5, 2021 · 0 comments

Comments

@gdbassett
Copy link

Before filing an issue, please make sure you are using the latest development version which you can install using install.packages("aws.s3",repo="https://rforge.net") (see README) since the issue may have been fixed already. Also search existing issues first to avoid duplicates.

Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

both data.table::fread() and readr::read_csv() are able to load csvs using aws.s3::get_object():
dt <- data.table::fread(aws.s3::get_object(filename, bucket=bucket, as = "text"))

However, if the full file is larger than the R maximum string vector size (2^31-1 bytes), the read will fail for that reason. I assume this is happening because fread and read_csv cannot read the file in in chunks and are instead reading it as an entire vector before splitting out the columns.

Would it be possible to to provide the CSV in chunks to fread/read_csv to avoid this hard limit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant