Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Six bucket upload #79

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from
Open

Six bucket upload #79

wants to merge 5 commits into from

Conversation

sckott
Copy link
Member

@sckott sckott commented Jan 13, 2025

fixes #67

@seankross please review, thanks.

Note that six_file_upload function mentioned in the related issue is already in the package on main branch

I'm still not sure about what the best behavior should be with respect to directories of files locally. Should a dir of files be uploaded into a dir of the same name, or just exploded into the bucket name? When I used the s3fs fxn for uploading to a bucket it exploded the files in a local folder into the aws bucket at the root rather than into a folder in the bucket. Perhaps only if the user specifies a path in the remote arg in addition to the bucket name, e.g., bucketasdfasdfasdf/somedir, then somedir would be the dir name - BUT that somedir could also just be the file name, so this seems precarious.

We could just wrap the s3fs function, but since it's been a while since we were focusing on this package, I know I tried to move away from s3fs functions but I'm not remembering why now.

@sckott sckott added this to the v0.2 milestone Jan 13, 2025
@sckott sckott requested a review from seankross January 13, 2025 17:53
@seankross
Copy link
Collaborator

seankross commented Jan 13, 2025

the magic for this function would be:

  • a path to a file appears at the top level of a bucket
  • a path to a dir reconstructs the dir structure within the bucket

let's say I pass in this vector into the function:

c(
  "/Users/sean/science/microscope/img1.png",
  "/Users/sean/science/microscope/img2.png",
  "/Users/sean/science/experiment1/",
)

and let's say this is the structure of /Users/sean/science/experiment1/:

experiment1/
  README.md
  data/
    results.csv
  code/
    analysis.py

the following "keys" (to use AWS parlance) would be created in the bucket:

/img1.png
/img2.png
/experiment1/README.md
/experiment1/data/results.csv
/experiment1/code/analysis.py

It would be good to be able to change the level too, so you didn't always have to use the top level, but I think you've built that functionality already here.

If this does what I've described let me know and I will take it for a spin.

Copy link
Collaborator

@seankross seankross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@sckott
Copy link
Member Author

sckott commented Jan 13, 2025

that's not the current behavior right now wrt dir's. right now all the files in the dir get put into the root bucket path. i'll work on that and ping you when it's done

@seankross
Copy link
Collaborator

It would be good to be able to change the level too, so you didn't always have to use the top level, but I think you've built that functionality already here.

@sckott do you understand what I'm saying here?

@sckott
Copy link
Member Author

sckott commented Jan 13, 2025

By level I assume you mean how far down do you walk through the local directory? If that's what you're referring to, probably the thing people are most familiar with is just recursive or not, yeah? That is, just the top level, which in this example would be just the readme.md file, or recursive, which would be the example you gave where you go recursively down through dirs until there's no more dirs to go into

@seankross
Copy link
Collaborator

@sckott

By level I assume you mean how far down do you walk through the local directory?

nah lemme clarify, I'm thinking of functionality like:

six_bucket_upload(
  c(
    "/Users/sean/science/microscope/img1.png",
    "/Users/sean/science/microscope/img2.png",
    "/Users/sean/science/experiment1/",
  ),
  destination_path = "nih_grant/aim1" # probably a bad choice for an arg name
)

which would create these keys:

/nih_grant/aim1/img1.png
/nih_grant/aim1/img2.png
/nih_grant/aim1/experiment1/README.md
/nih_grant/aim1/experiment1/data/results.csv
/nih_grant/aim1/experiment1/code/analysis.py

But if you don't specify destination_path it just defaults to use the top level of the bucket.

@sckott
Copy link
Member Author

sckott commented Jan 13, 2025

Okay, so I think what i'm hearing with respect to dirs is:

  • for the local side of things we just recursively go through the dir and extract files all the way down the tree and mirror that structure on the bucket
  • for the aws bucket we use the root of the bucket for all the uploaded stuff UNLESS the user specifies some other dir within the bucket

Let's assume for simplicity that the arg name is destination_path ... If the user wants different local dirs in different destination_path dirs on the bucket, then they just need to call the function once for each of the destination_path values. I think it would be too complex to allow >1 destination_path value

@seankross
Copy link
Collaborator

seankross commented Jan 13, 2025

Sounds great! Thank you. And yes I agree.

@sckott sckott requested a review from seankross January 14, 2025 18:06
if (length(remote_parts) > 1) {
key_prefix <- path_join(remote_parts[-1])
cli_info("using key prefix {.strong {key_prefix}}")
path$key <- path(key_prefix, path$key)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need an @importFrom fs path wherever you think is appropriate.

Copy link
Collaborator

@seankross seankross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks and works great except for the one comment above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants